Now we’re going to see how to write a Union, a Difference, and an Intersection. Specifically, we’re going to test these binary operators on the tables entity and intermediary.
The Status of Our Investigation
Entity contains a list of companies. Intermediary contains a list of intermediaries. In some cases, these intermediaries are also companies, so there might be some companies that are found in both tables.
Let’s test to see if this is the case with Big Data Crunchers Ltd.! If it is, it will mean that this company has acted as an intermediary.
Union, Intersection, and Difference
To use a binary operator on two tables, these two tables have to be of the same schema. This is not the case with entity and intermediary.
To solve this problem, we’re going to put them in the same schema, using a projection. So instead of using the entity and intermediary tables as they are, we’ll take the tables that result from these two projections instead, because they will definitely be of the same schema:
SELECT name, id_address FROM intermediary ;SELECT name, id_address FROM entity ;
The UNION, EXCEPT, and INTERSECT Keywords
To retrieve the list of companies in entity and the list of all the intermediaries, you need to formulate the following query, using
The list of companies in the entity table that are not intermediaries is obtained by:
SELECT name, id_address FROM entityEXCEPTSELECT name, id_address FROM intermediary ;
If we want to know which companies are found in both the entity table and the intermediary table:
SELECT name, id_address FROM entityINTERSECTSELECT name, id_address FROM intermediary ;
Unfortunately, yes, there is a "but". I hate to be the one to tell you this, but difference and intersection are rarely performed in this way.
Thinking It Through…
If we want to know whether a row in one table is found in another table, then we have to compare the value of each of its attributes, one to one. If there are a lot of attributes, this is sub-optimal!
There is a simpler way to do it, but it can be used only if both tables are extracted from an original table containing at least one candidate key.
Let’s take a person table containing people of French nationality, described in 20 columns. The social security number column is one of the candidate keys because it identifies each individual. In this original person table, we create an adult table, which contains people over the age of 18, and a Houstonians table containing people who live in the city of Houston.
To find out whether a person appears in both adult and Houstonians, there’s no need to compare all 20 columns: just look to see whether his or her social security number is found in both tables!
The other method
Here’s how it works:
SELECT *FROM adultWHERE social_security_number IN (SELECT social_security_number FROM houstonians) ;
Translated into English, this query would be:
Select all the rows of the adult table for which the social security number is found in:
Select all the social security numbers in the Houstonians table.
The preceding code executes an intersection operation. To execute a difference operation, simply replace the
IN keyword with
NOT IN .
UNIONto execute a union operation.
EXCEPTto execute intersection and difference operations, respectively.
Where you can use primary keys, there are better ways than INTERSECT and EXCEPT !
A Little Exercise!
The database is online. Get some practice by rewriting the queries found in this chapter. Don’t hesitate to change them, play with them, and “see what happens.”