Do you know that it is possible to nest queries inside other queries? I assume you do—because we mentioned it fleetingly earlier in this course!
Let’s look at an example. In the chapter on HAVING clauses, I explained that the HAVING keyword is used to apply a restriction to a table after aggregation. But if HAVING seems too complicated, you can use a nested query (aka subquery) instead:
-- QUERY n°1
SELECT *
FROM (SELECT id_address, count(*) AS cnt FROM entity GROUP BY id_address) a
WHERE a.cnt > 500 ;
-- QUERY n°2, same as query n°1:
SELECT id_address, count(*) AS cnt
FROM entity
GROUP BY id_address
HAVING count(*) > 500 ;
What do these queries return? The addresses of over 500 companies.
The IN Keyword
These two queries only return the address identifier, which is a whole number. How can we obtain the address in a more readable form?
We could create a join with the address table, but we could also use the IN
operator, whereby we would request the following:
Give me the rows of the address table in which the identifier is found IN the id_address column of the table obtained by query no. 2.
Once again, we find ourselves with a nested query, or sub-query, using the results of query no. 2 (above on this page):
SELECT *
FROM address a
WHERE a.id_address IN (
SELECT id_address
FROM entity
GROUP BY id_address
HAVING count(*) > 500
);
The ANY and ALL Keywords
Are there intermediaries that are more active than the two intermediaries we found earlier (Bargain Banking S.A. and Plouf Financial Services Corp.)?
First, we have created a nb_entities table, specifying for each intermediary the number of companies it assisted in creating:
id_intermediary | intermediary_name | cnt_entities |
5000 | Pacher Banking S.A. | 2184 |
5001 | Plouf Financial Services Corp. | 720 |
[...] | [...] | [...] |
To answer our question, we need to use an operator followed by ALL
or ANY:
SELECT * FROM nb_entities WHERE cnt_entities > ALL(SELECT cnt_entities FROM nb_entities WHERE intermediary_id IN (5000,5001,5002));
The EXISTS Keyword
Let’s go back to the query we saw above, in the section on the IN
operator, and put ourselves in the shoes of the RDBMS that is being asked to execute this query:
CREATE TABLE nb_entities AS
SELECT i.id AS intermediary_id,
i.name AS intermediary_name,
e.jurisdiction,
e.jurisdiction_description,
count(*) AS cnt_entities
FROM intermediary i, assoc_inter_entity a, entity e
WHERE a.entity = e.id AND a.inter = i.id
GROUP BY i.id, i.name, e.jurisdiction, e.jurisdiction_description;
Normally, you are supposed to check whether the WHERE
clause is true or false for each row of the address table, right? The condition will be tested as many times as there are rows.
So far so good. But here, in the WHERE
clause, we find a nested subquery (from row 4 to row 7). Are you going to recalculate this subquery each time you check the WHERE
condition, for every single address row??? That would be silly, because the result of this subquery is always the same. A better idea would be to store this result, rather than recalculate it every time!
Except... if this subquery depends on the row for which we are testing the WHERE
clause, it is no longer possible to cache the result. :( And if you have to recalculate the subquery every time, it’s going to take forever!
If you find yourself in this situation, it is preferable (from a performance standpoint) to use the EXISTS
operator, which tests whether a subquery contains at least one row. If there is at least one row, it stops executing the subquery, because it is satisfied!
Go Further: Using IN on Multiple Columns
In this chapter, we checked whether a value was “in” a specified list of values.
It is also possible to check whether a couple of valuesvalue1, value2
is found in a list of couples c1, c2
, using the following approach:
SELECT * FROM t1 WHERE t1.valeur1, t1.valeur2 IN (SELECT c1, c2 FROM t2) ;
Go Further: Temporary Tables
Theoretically, there is no real limit to how many queries you can nest. It’s even possible to nest nested queries within other queries… and so on! :soleil:
In order to avoid long, drawn-out queries, know that it is sometimes possible (depending on your RDBMS) to create temporary tables. This involves creating new tables for storing the result of a query so you can use that result in another query.
For example, query no. 1 at the top of this page is equivalent to the following code:
CREATE TABLE nb_entities AS
SELECT i.id AS intermediary_id,
i.name AS intermediary_name,
e.jurisdiction,
e.jurisdiction_description,
count(*) AS cnt_entities
FROM intermediary i, assoc_inter_entity a, entity e
WHERE a.entity = e.id AND a.inter = i.id
GROUP BY i.id, i.name, e.jurisdiction, e.jurisdiction_description;
These temporary tables will be deleted when you close your RDBMS session.
Summary
To test whether a specified value is found in the result of a query, use
IN
.Suppose B is a list of values returned by a query. To test whether a condition (for example
A < B
) is true for...... all values of
B
, useALL
;... at least one value of
B
, useANY
.
The
EXISTS
operator tests whether the subquery contains at least one row.