Now we come to a fundamental concept of relational algebra that you use all the time: the Join.
Let’s go back to the concept of foreign keys, which are used to link relations. We know that the relations are linked, but we don’t yet know how to use these links. That’s what we’re going to see now! :D
Let’s return to the example of our two relations apple and variety.
identifier | weight | diameter | color | name_variety |
1 | 151 g | 8.3 cm | Red | Red Delicious |
2 | 169 g | 9.1 cm | Green | McIntosh |
3 | 134 g | 8.0 cm | Green | McIntosh |
label | price_per_kilo | maturity | taste |
Red Delicious | 3.19 | late Sept. | Sweet |
Macintosh | 3.49 | mid. Sept. | Tart |
In the chapter Create Links between Your Relations using Foreign Keys we wanted to know the price per kilo of a given apple, and so we wrote the following sentence:
If I want to know the price per kilo of a given apple, I must first know its variety. Then I’ll need to locate the corresponding row in the variety table in order to know its price.
Link relations using Inner Joins
More formally, we say that we perform an inner join of the apple relation and the variety relation according to the condition apple.name_variety = variety.label
.
And yes, we must always specify the join condition. Here, the condition states that the name_variety column of apple references the label column of variety.
Here are the results:
apple. | apple. | apple. | apple. | apple. | variety. | variety. | variety. | variety. |
1 | 151 g | 8.3 cm | red | Red Delicious | Red Delicious | 3.19 | late Sept. | Sweet |
2 | 169 g | 9.1 cm | green | McIntosh | McIntosh | 3.49 | mid. Sept. | Tart |
3 | 134 g | 8.0 cm | green | McIntosh | McIntosh | 3.49 | mid. Sept. | Tart |
We have achieved our objective. Knowing my apple’s identifier gives me direct access to its price per kilo: it’s on the same row!
Joining multiple columns
You will recall that a primary key can consist of multiple columns. You should also know that we can join more than one column (let’s say two columns, for example).
This being the case, a foreign key that references this primary key will necessarily consist of two columns. The join condition will be as follows:
relation1.attribute1 = relation2.attribute3 AND relation1.attribute2 = relation2.attribute4
id [PK] | attribute1 | attribute2 | [...] |
0 | 398 | AX | [...] |
[...] | [...] | [...] | [...] |
attribute 3 [PK] | attribute 4 [PK] | [...] |
398 | AX | [...] |
[...] | [...] | [...] |
Link relations using outer joins
What happens if we don’t know the variety of apple 3? In this case, the foreign key will contain a null value, and will, therefore, have no corresponding value in the variety table.
Identifier | weight | diameter | color | name_variety |
1 | 151g | 8.3 cm | red | Red Delicious |
2 | 169g | 9.1 cm | green | Braeburn |
3 | 134g | 8.0 cm | green | NULL |
Our previous join condition apple.name_variety = variety.label
will no longer be met, because the apple.name_variety
cell is null.
Left outer joins
If we also want to retain apples whose varieties are not known to us, we have to apply an outer join. If the apple table is on the left, and the variety table is on the right, we perform a left outer join, because we are retaining all of the rows of the table on the left:
apple. | apple. | apple. | apple. | apple. | variety. | variety. | variety. | variety. |
1 | 151g | 8.3 cm | red | Red Delicious | Red delicious | 3.19 | late Sept. | Sweet |
2 | 151g | 9.1 cm | green | Braeburn | Braeburn | 3.49 | mid. Sept. | Tart |
3 | 134g | 8.0 cm | green | NULL | NULL | NULL | NULL | NULL |
Right outer joins
In the same way, if there were no apples of the Gala variety, but we still wanted to retain the information about this variety after the join, we would need to apply a right outer join. This way, we would retain all of the rows of the table on the right (variety), even if some of the varieties had no corresponding value in the apple table:
apple. | apple. | apple. | apple. | apple. | variety. | variety. | variety. | variety. |
1 | 151 g | 8.3 cm | Red | Red Delicious | Red Delicious | 3.19 | late Sept. | Sweet |
2 | 169 g | 9.1 cm | Green | Braeburn | Braeburn | 3.49 | mid. Oct. | Sweet/Tart |
NULL | NULL | NULL | NULL | NULL | Gala | 3.19 | mid. Sept. | Sweet |
NULL | NULL | NULL | NULL | NULL | McIntosh | 2.99 | mid. Sept. | Tart |
Notice that the results of the right outer join retain all the records from the right table (variety) and all matching records from the left table (apple).
Full joins
And if we want to retain all of the information in both apple and variety, I have a third type of outer join on hand: the full join! Here’s what it looks like:
apple. | apple. | apple. | apple. | apple. | variety. | variety. | variety. | Variety. |
1 | 151 g | 8.3 cm | red | Red Delicious | Red Delicious | 3.19 | late September/early October | Sweet |
2 | 169 g | 9.1 cm | green | Braeburn | Braeburn | 3.49 | mid September | Sweet/Tart |
3 | 134 g | 8.0 cm | green | NULL | NULL | NULL | NULL | NULL |
NULL | NULL | NULL | NULL | NULL | Gala | 3.19 | mid September | Sweet |
NULL | NULL | NULL | NULL | NULL | Macintosh | 2.99 | mid September | Tart |
Notations
Here is the notation I recommend for the inner join:
Join (apple, variety, apple.name_variety = variety.label)
For right, left, and full outer joins:
JoinLeft (apple, variety, apple.name_variety = variety.label)
JoinRight (apple, variety, apple.name_variety = variety.label)
JoinFull (apple, variety, apple.name_variety = variety.label)
Go Further: Natural Joins
And now, a little trick that’s actually very useful: the natural join.
A natural join is a regular join. If the two relations being joined have columns that appear in both relations (exactly the same name in both tables), the join can be implicit.
Thus, if I change the name of the “label” column in the variety table to name_variety, both tables will have this column in common. If we want to join all of the columns that have the same name in both relations (here there is only one such column), we can use the natural join.
With the natural join, we no longer need to specify a condition, because it is implicit—the columns have the same name!
So...
Join (apple, variety, apple.name_variety = variety.name_variety)
is equivalent to
JoinNatural (apple, variety)
.
Summary
A join "combines" two tables based on a specified condition.
An inner join doesn’t necessarily retain all of the information of the two tables being joined. It only includes matching rows between tables.
In order to retain all of the information of one (or both) of the tables, you need to apply an outer join.