Now let’s look at aggregation!
You will recall that an aggregation has two parts:
a set of partitioning attributes
one or more aggregate function(s)
Now we’re going to see how to implement these in SQL.
I must confess: this chapter is a little long—but I really want you to grasp these concepts!
The chapter can essentially be summed up in one line of code:
SELECT status, count(*) FROM entity GROUP BY status;
We have placed the partitioning attribute (here,
status ) after the keyword
GROUP BY , and the aggregate function
COUNT() in the
SELECT clause. Alternatively, we could add the partitioning attributes in the
And now you know everything there is to know (well, almost)!
Thinking It Through...
First let’s test the aggregate functions and leave the partitioning attributes for later.
A little reminder: an aggregate function takes multiple rows and returns a single row.
For example, if we write…
SELECT max(incorporation_date) AS maxi FROM entity ;
max function takes the attribute
incorporation_date , which has multiple rows (all of the rows of the entity table), and returns an attribute (we’ll call it
maxi ) on a single row.
You really should execute this query in order to fully understand it.
Inputs and outputs
Most of the basic aggregate functions of SQL (that are predefined by the language) take a single attribute and also return a single attribute. This is the case for the
max function we just saw.
However, there is a function that doesn’t adhere to this rule: the
count function. The
count function returns a whole number: the number of rows it received as input. To count the number of rows, there’s no need to know the value of any argument of these rows! So we simply assign it the character
SELECT count(*) FROM entity ;
A little puzzle
If entity contains 1,000 rows, how many rows will each of these two queries return?
SELECT incorporation_date FROM entity ;SELECT min(id) FROM entity ;SELECT incorporation_date, min(id) FROM entity ;
The first query will return 1,000 rows. The second will return only one row. However, for the third query, your RDBMS should return an error, because it won’t know whether to return 1 row or 1,000 rows.
On the other hand, this query will work, and return 1 row:
SELECT max(incorporation_date), min(id) FROM entity ;
This is because the two columns we are querying both result from an aggregate function.
Can we conclude, then, that in the
SELECT clause, we can use an aggregate function only if all of the other columns also result from an aggregate function?
Yes! Unless your query contains a
GROUP BY . If that’s the case, there’s more to consider.
And that’s what we’ll be looking at in the next section.
In an aggregation, the partitioning attributes are specified in the
GROUP BY clause: :
SELECT count(*) FROM entity GROUP BY status ;
Here, we are counting the number of companies for each of the statuses found in entity.
Yes, but this query only returns a column with numbers! We don’t know what the numbers correspond to.
That’s right! To solve this problem, the status value can be displayed as follows:
SELECT status, count(*) FROM entity GROUP BY status ;
... which brings us back to the rule we mentioned earlier, which is completed as follows to form a Golden Rule:
La règle d'or de l'agrégation
Let’s Continue Our Investigation
Ready to return to the main query of the previous chapter? You can do it!
SELECTi.id as intermediary_id,i.name as intermediary_name,e.id as entity_id,e.name as entity_name,e.status as entity_statusFROMintermediary i,assoc_inter_entity a,entity eWHEREa.entity = e.idAND a.inter = i.idAND e.name = 'Big Data Crunchers Ltd.' ;
You will recall that this links intermediaries to entities. It enabled us to find two intermediaries with identifiers 5000 and 5001.
Here, we want to know the number of entities (companies) these two intermediaries created in each of the jurisdictions (for our purposes, jurisdiction = country ). So we’re going to have to use GROUP BY!
As you may have guessed, our partitioning attributes will be:
intermediary (taking, for example, its identifier and its name)
Let’s begin by amending our previous query. First of all, this earlier query is limited to the entity Big Data Crunchers Ltd. Here, we want to analyze all of the entities created by the two intermediaries. So let’s replace row 14 with
AND (i.id = 5000 OR i.id = 5001) .
Let’s add our three partitioning attributes in the
GROUP BY clause, then delete all of the columns that are not partitioning attributes from the
SELECT clause (that is, rows 4, 5, and 6).
All that’s left is our aggregate function,
COUNT() , because we want to count the number of entities. Here’s the result:
SELECTi.id as intermediary_id,i.name as intermediary_name,e.jurisdiction,count(*)FROMintermediary i,assoc_inter_entity a,entity eWHEREa.entity = e.idAND a.inter = i.idAND (i.id = 5000 OR i.id = 5001)GROUP BYi.id, i.name, e.jurisdiction;
Go Further: Aggregate Functions and Scalar Functions
In the chapter covering the SELECT clause, we saw that it is possible to apply functions in the
SELECT clause. Are these aggregate functions?
SELECT abs(id) AS valeur_absolue FROM entity ;
No. In this query (which returns the absolute value of the company identifier),
abs returns a result for as many rows as it received as input. If there are 1,000 companies in the entity table, then there will be 1,000 absolute values to calculate. However, an aggregate function returns only a single row.
abs function is referred to as a scalar function.
In aggregations, the aggregate functions are placed in the
... and the partitioning attributes are placed in the
SELECTclause, an aggregate function can be used only if all of the other columns also result from an aggregate function OR are present in the