Window functions are often used to (among other things):
calculate cumulative sums
order rows within a partition.
As you will see, there are many similarities between aggregations and window functions.
Let’s wave our little magic want over this...
SELECT incorporation_date, min(id) OVER() FROM entity ;
Tada! No more error! When we insert
OVER() , the aggregate function
min() returns as many rows as there are in entity. This is not normal behavior for an aggregate function!
Yes, it’s true... there are many similarities between aggregations and window functions. Both are executed in two phases, and they differ only in respect to phase two:
Partitioning by partitioning attributes
Partitioning by partitioning attributes
Application of an aggregate function
As many rows as aggregates
As many rows as the original table
Window functions are defined using an OVER clause. The OVER clause has two capabilities, each with its own behavior! Here are the corresponding syntaxes:
OVER (PARTITION BY ...)
OVER (PARTITION BY ... ORDER BY ...)
OVER (PARTITION BY...)
Let’s analyze these two queries, which use the nb_entities table from the previous chapter:
-- REQUETE n°1SELECT sum(cnt_entities) FROM nb_entities GROUP BY id_intermediary ;-- REQUETE n°2SELECT sum(cnt_entities) OVER (PARTITION BY id_intermediary) FROM nb_entities ;
Query no. 2 functions almost like the aggregation in query 1; the only thing that’s different is what
In query no. 2, if
sum receives as input the list of values
[10, 1, 5] , it won’t return
16 ; it will return
[16, 16, 16].
In both cases, we are calculating the total number of companies created by each intermediary. But here is how the result is presented using a window function:
SELECT id_intermediary, jurisdiction, cnt_entities, sum(cnt_entities) OVER (PARTITION BY id_intermediary) AS entities_by_intermediary FROM nb_entities ;
OVER (PARTITION BY... ORDER BY...)
With an aggregate function
When we add
ORDER BY to the
OVER clause, the aggregate function behaves in still another way!
Indeed, it considers these values to be ranked. As before, the aggregate function is going to return as many rows as it received as input. But it’s going to do this:
Perform a calculation on the 1st value, and return the result;
Perform a calculation on the 1st and 2nd values, and return the result;
Perform a calculation on the 1st, 2nd, and 3rd values, and return the result;
And so on.
Thus, if the
sum function receives
[10, 5, 1], it will return
[10, 15, 16], because
10 = 10
10 + 5 = 15
10 + 5 + 1 = 16
sum behaves like a cumulative sum!
With a ranking function
If we use a ranking function (for example,
rank ), it will simply return the order of the rows. These rows have been previously sorted, thanks to
ORDER BY .
A little example: we are in a race, and have recorded the time it took each runner to finish the course. To find out who will be on the podium, we need to rank the athletes according to their time:
SELECT firstname, time, rank() OVER (ORDER BY temps) FROM course ;
3 min 15 s
3 min 19 s
3 min 16 s
10 min 39 s
Cumulative Sums and Ranking Numbers
So let’s assign a rank to each jurisdiction for each intermediary. Let’s call our ranking
rank . To assign a rank to a jurisdiction, I sort
cnt_entities in descending order. Since I want to obtain a ranking for each intermediary, I write
PARTITION BY id_intermediary .
We can also calculate a cumulative sum! Let’s calculate the cumulative sum of
cnt_entities for each intermediary, sorting the rows in descending order of
cnt_entities . Let’s call our cumulative sum
SELECT id_intermediary, jurisdiction, cnt_entities, rank() OVER(PARTITION BY id_intermediary ORDER BY cnt_entities DESC) AS rank, sum(cnt_entities) OVER(PARTITION BY id_intermediary ORDER BY cnt_entities DESC) AS cum_sumFROM nb_entities ;
Here's the result:
A Little Exercise!
The database is online. Get some practice by rewriting the queries found in this chapter. Don’t hesitate to change them, play with them, and “see what happens.”
Window functions are very similar to aggregations, except they don’t change the number of rows.
OVERkeyword can be used to form groups of rows (aggregate functions) or to order rows within a partition (ranking functions).
OVERcan be used to calculate