Last updated on 8/29/24

## Understand window functions using OVER and PARTITION BY

Window functions are often used to (among other things):

• calculate cumulative sums

• order rows within a partition.

As you will see, there are many similarities between aggregations and window functions.

Let’s wave our little magic want over this...

``````SELECT incorporation_date, min(id) OVER() FROM entity ;
``````

Tada! No more error! When we insert  `OVER()`  , the aggregate function  `min()`  returns as many rows as there are in entity. This is not normal behavior for an aggregate function!

Yes, it’s true... there are many similarities between aggregations and window functions. Both are executed in two phases, and they differ only in respect to phase two:

 Agrégation Window Function Phase 1 Partitioning by partitioning attributes Partitioning by partitioning attributes Phase 2 Application of an aggregate function Application... of an aggregate function with modified behaviorOR a ranking function. Result As many rows as aggregates As many rows as the original table

Window functions are defined using an OVER clause. The OVER clause has two capabilities, each with its own behavior! Here are the corresponding syntaxes:

• `OVER (PARTITION BY ...)`

• `OVER (PARTITION BY ... ORDER BY ...)`

Let’s analyze these two queries, which use the nb_entities table from the previous chapter:

``````-- REQUETE n°1SELECT sum(cnt_entities) FROM nb_entities GROUP BY id_intermediary ;-- REQUETE n°2SELECT sum(cnt_entities) OVER (PARTITION BY id_intermediary) FROM nb_entities ;
``````

Query no. 2 functions almost like the aggregation in query 1; the only thing that’s different is what  `sum`  returns.

In query no. 2, if  `sum`  receives as input the list of values  `[10, 1, 5]`  , it won’t return  `16`  ; it will return  `[16, 16, 16].`

In both cases, we are calculating the total number of companies created by each intermediary. But here is how the result is presented using a window function:

``````SELECT id_intermediary,     jurisdiction,     cnt_entities,     sum(cnt_entities) OVER (PARTITION BY id_intermediary) AS entities_by_intermediary FROM nb_entities ;
``````
 id_intermediary jurisdiction cnt_entities entities_by_intermediary 4000 SAM 10 16 4000 PMA 1 16 4000 CYP 5 16 4001 SAM 3 5 4001 NEV 2 5 [...] [...] [...] [...]

#### With an aggregate function

When we add  `ORDER BY`  to the  `OVER`  clause, the aggregate function behaves in still another way!

Indeed, it considers these values to be ranked. As before, the aggregate function is going to return as many rows as it received as input. But it’s going to do this:

1. Perform a calculation on the 1st value, and return the result;

2. Perform a calculation on the 1st and 2nd values, and return the result;

3. Perform a calculation on the 1st, 2nd, and 3rd values, and return the result;

4. And so on.

Thus, if the  `sum`  function receives `[10, 5, 1]`, it will return `[10, 15, 16]`, because

• 10 = 10

• 10 + 5 = 15

• 10 + 5 + 1 = 16

Here,  `sum`  behaves like a cumulative sum!

#### With a ranking function

If we use a ranking function (for example,  `rank`  ), it will simply return the order of the rows. These rows have been previously sorted, thanks to  `ORDER BY` .

A little example: we are in a race, and have recorded the time it took each runner to finish the course. To find out who will be on the podium, we need to rank the athletes according to their time:

``````SELECT     firstname,     time,     rank() OVER (ORDER BY temps) FROM course ;
``````
 firstname time rank() Naomi 3 min 15 s 1 Sarah 3 min 19 s 3 Sonia 3 min 16 s 2 Luke 10 min 39 s 4

So let’s assign a rank to each jurisdiction for each intermediary. Let’s call our ranking  `rank`  . To assign a  rank  to a jurisdiction, I sort  `cnt_entities`  in descending order. Since I want to obtain a ranking for each intermediary, I write  `PARTITION BY id_intermediary`  .

We can also calculate a cumulative sum! Let’s calculate the cumulative sum of  ` cnt_entities`  for each intermediary, sorting the rows in descending order of  ` cnt_entities`  . Let’s call our cumulative sum  `cum_sum`  .

``````SELECT id_intermediary,    jurisdiction,    cnt_entities,    rank()        OVER(PARTITION BY id_intermediary ORDER BY cnt_entities DESC) AS rank,    sum(cnt_entities)         OVER(PARTITION BY id_intermediary ORDER BY cnt_entities DESC) AS cum_sumFROM nb_entities ;
``````

Here's the result:

 id_intermediary jurisdiction cnt_entitites rank cum_sum 4000 SAM 10 1 10 4000 CYP 5 2 15 4000 PMA 1 3 16 4001 SAM 3 1 3 4001 NEV 2 2 5 [...] [...] [...] [...] [...]

• Window functions are very similar to aggregations, except they don’t change the number of rows.

• The  `OVER`  keyword can be used to form groups of rows (aggregate functions) or to order rows within a partition (ranking functions).

• The keyword  `OVER`  can be used to calculate  `cumulative sums`  or  `determine rankings` .

Ever considered an OpenClassrooms diploma?
• Up to 100% of your training program funded
• Flexible start date
• Career-focused projects
• Individual mentoring
Find the training program and funding option that suits you best