Advanced SQL Part 2 | Window Functions (PARTITION BY, LAG/LEAD, EXCLUDE).

More from Author
Akash Kumar
Akash

Engineering Lead

7 min read

Welcome to this exploration of Advanced SQL - Window FunctionsPart 2.

** To learn frame type - ROWS, RANGE, and GROUPS, refer to the part 1 of this blog series

RECAP:

What are window functions?

With SQL:2003, the ISO SQL Standard introduced window functions, a new mode of row-based computation:

Copy Code
              
                                              Input            Output
  Aggregate functions(sum, avg, count)    group of rows →  row (one per group)
  window function                         row vicinity →  row (one per row)
              
          
  • Window functions operate on a set of rows ( commonly known as window or frame or row vicinity) and return a single value for each row.
  • The term window describes the set of rows on which the functionoperates.
  • A window function uses values from the rows in a window.

In the Part 1 of this blog series, we learnt how to create a frame using frame_types - ROWS, RANGE and GROUPS. Now we’ll see more about the other functionality that window functions provide.

1. EXCLUDE

Copy Code
            
  <window_function> OVER (ORDER BY A₁,...,Aₙ [ RANGE frame EXCLUDE ])
            
        

In the above table, let's consider Q4 as the current row
Frame is BETWEEN 1 PRECEDING AND CURRENT ROW
EXCLUDE NO OTHERS -
EXCLUDE CURRENT ROW
EXCLUDE GROUP
EXCLUDE TIES

Query 1 : EXCLUDE example

Copy Code
            
  SELECT w.row                          AS "current row",
      w.a,
      array_agg(w.row) OVER win      AS "rows in frame",
      array_agg(w.row) OVER win1 AS "EXCLUDE CURRENT ROW",
      array_agg(w.row) OVER win2 AS "EXCLUDE GROUP",
      array_agg(w.row) OVER win3 AS "EXCLUDE TIES"
  FROM   sample AS w
  WINDOW win  AS (ORDER BY w.a RANGE BETWEEN 1 PRECEDING AND CURRENT ROW),
      win1 AS (ORDER BY w.a RANGE BETWEEN 1 PRECEDING AND CURRENT ROW EXCLUDE CURRENT ROW),
      win2 AS (ORDER BY w.a RANGE BETWEEN 1 PRECEDING AND CURRENT ROW EXCLUDE GROUP),
      win3 AS (ORDER BY w.a RANGE BETWEEN 1 PRECEDING AND CURRENT ROW EXCLUDE TIES)
  ORDER BY w.a;
            
        
  current row | a | rows in frame | EXCLUDE CURRENT ROW | EXCLUDE GROUP | EXCLUDE TIES
  ------------+---+---------------+---------------------+---------------+---------------
      Q1      | 1 | {Q1}          |                     |               | {Q1}
      Q2      | 2 | {Q1,Q2}       | {Q1}                | {Q1}          | {Q1,Q2}
      Q3      | 3 | {Q2,Q3,Q4,Q5} | {Q2,Q4,Q5}          | {Q2}          | {Q2,Q3}
      Q4      | 3 | {Q2,Q3,Q4,Q5} | {Q2,Q3,Q5}          | {Q2}          | {Q2,Q4}
      Q5      | 3 | {Q2,Q3,Q4,Q5} | {Q2,Q3,Q4}          | {Q2}          | {Q2,Q5}
      Q6      | 4 | {Q3,Q4,Q5,Q6} | {Q3,Q4,Q5}          | {Q3,Q4,Q5}    | {Q3,Q4,Q5,Q6}
      Q7      | 6 | {Q7,Q8}       | {Q8}                |               | {Q7}
      Q8      | 6 | {Q7,Q8}       | {Q7}                |               | {Q8}
      Q9      | 7 | {Q7,Q8,Q9}    | {Q7,Q8}             | {Q7,Q8}       | {Q7,Q8,Q9}      
        

Explanation -

Let us consider Q5 as our current row for the frame RANGE BETWEEN 1 PRECEDING AND CURRENT ROW

Rows in frame are Q2, Q3, Q4, Q5 - (Q3, Q4 and Q5 are 1 unit as they have the same value and the frame type is RANGE)

EXCLUDE CURRENT ROW - Q2, Q3, Q4 (Q5 is removed as it is the current row)

EXCLUDE GROUP - Q2 (Q3, Q4 and Q5 are removed as it a group)

EXCLUDE TIES - Q2, Q5 (Q3 and Q4 are removed as it was tied with Q5 as a peers)

Django Query 1 : EXCLUDE example

Django does not have explicit functions that perform EXCLUDE operations within window functions.

PARTITION BY

Optionally, we may partition the input table before rows are sorted and window frames are determined:

Copy Code
            
  <window_function>  OVER (PARTITION BY p1 ORDER BY A₁,...,Aₙ [ RANGE frame])          
            
        

1. ROWS

  • Frames never cross partitions.
  • BETWEEN ⋯ PRECEDING AND ⋯ FOLLOWING
  • respects partition boundaries.

Query 2 : PARTITION BY example

Copy Code
            
  SELECT w.row                     AS "current row",
         w.a,
         w.b                       AS "partition",
         COUNT(*)         OVER win AS "frame size",
         array_agg(w.row) OVER win AS "rows in frame",
            SUM(w.a) OVER win AS "∑ a (so far)"
  FROM   sample  AS w  
  WINDOW win AS (PARTITION BY w.b ORDER BY w.a ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
  ORDER BY w.b, w.a;
            
        

OUTPUT-

current row | a | partition | frame size |  rows in frame   | Σ a (so far)
------------+---+-----------+------------+------------------+--------------
    Q1      | 1 | false     |     1      | {Q1}             |     1         => 1
    Q4      | 3 | false     |     2      | {Q1,Q4}          |     4         => 1 + 3
    Q8      | 6 | false     |     3      | {Q1,Q4,Q8}       |     10        => 1 + 3 + 6
    Q7      | 6 | false     |     4      | {Q1,Q4,Q8,Q7}    |     16        => 1 + 3  + 6 + 6
    Q2      | 2 | true      |     1      | {Q2}             |     2         => 2
    Q5      | 3 | true      |     2      | {Q2,Q5}          |     5         => 2 + 3
    Q3      | 3 | true      |     3      | {Q2,Q5,Q3}       |     8         => 2 + 3 + 3
    Q6      | 4 | true      |     4      | {Q2,Q5,Q3,Q6}    |     12        => 2 + 3 + 3 + 4
    Q9      | 7 | true      |     5      | {Q2,Q5,Q3,Q6,Q9} |     19        => 2 + 3 + 3 + 4 + 7

        

Explanation -

(PARTITION BY w.b ORDER BY w.a ROWS BETWEEN 1 PRECEDING AND 2 FOLLOWING)
PARTITION BY w.b - table is split into 2 partitions - true partition, false partition
ORDER BY w.a - Inside each partition, data is ordered by 'a-int' column When Q2 is the current row, It'll not consider Q8 as 1 preceding as it is in a different partition.

Django Query 2 : PARTITION BY example

Copy Code
            
  from django.db.models import Avg, F, RowRange, Window, Count, Sum

  window = {
  'partition_by': [F('b')],
  'frame': RowRange(start=None, end=0), // Unbounded preceding and current row
  'order_by': F('a').asc()
  }

  qs = Sample.objects.annotate(
              frame_size = Window(
                  expression=Count('a'), **window
              ), 
              running_sum = Window(
                  expression=Sum('a'), **window
              ),		   
  ).order_by('b')

  for i in qs:
     print(i.row, i.a, i.b, i.frame_size, i.running_sum)
            
        

OUTPUT-

Q1 1 False 1 1
Q4 3 False 2 4
Q7 6 False 3 10
Q8 6 False 4 16
Q2 2 True 1 2
Q3 3 True 2 5
Q5 3 True 3 8
Q6 4 True 4 12
Q9 7 True 5 19

3. LAG/LEAD

Lag/ Lead functions can be used to get the row at the offset +- n relative to the current row.

Copy Code
            
  LAG/LEAD ( e, n, d ) OVER ( [  PARTITION BY p1 ... pm ]
 	                          [ ORDER BY   e1 ... en  ] )
            
        

Scope of LAG/LEAD is the partition. That means members/records outside the partition cannot be accessed.
e - field of row variable
n - offset value +/- 2
d - if offset value is out of range, then return d
If d is not specified, then it returns NULL

The frame of the current row is irrelevant for LAG/LEAD.

advance sql window p2

Query 3 : LAG/LEAD example

Copy Code
            
  SELECT w.row                             AS "current row",
         w.a                               AS a,
         w.b                               AS "partition",
         LAG (w.row, 1, 'no row') OVER win AS "lag",
         LEAD(w.row, 1, 'no row') OVER win AS "lead"
  FROM   sample AS w  
  WINDOW win AS (PARTITION BY w.b ORDER BY w.a)
  ORDER BY w.b, w.a;
            
        

OUTPUT -

  current row | a | partition |  lag   |  lead
  ------------+---+-----------+--------+--------
      Q1      | 1 | false     | no row | Q4
      Q4      | 3 | false     | Q1     | Q8
      Q8      | 6 | false     | Q4     | Q7
      Q7      | 6 | false     | Q8     | no row
      Q2      | 2 | true      | no row | Q5
      Q5      | 3 | true      | Q2     | Q3
      Q3      | 3 | true      | Q5     | Q6
      Q6      | 4 | true      | Q3     | Q9
      Q9      | 7 | true      | Q6     | no row
        

Explanation -

WIndow function is partitioned by 'b'. Therefore, the table is split into 2 partitions - true partition, and false partition.

For current row Q7, Lag(1) is Q8 as it is the previous value. Lead(1) is “no row” as it is the last record in the false partition.

Django Query 3 : LAG/LEAD example

Copy Code
            
  from django.db.models.functions import Lag, Lead

  window = {
  'partition_by': [F('b')],
  'frame': RowRange(start=None, end=0),
  'order_by': F('a').asc()
  }

  qs = Sample.objects.annotate(
            previous = Window(
               expression=Lag('a', 1), **window
             ), 
		    next_val = Window(
               expression=Lead('a', 1), **window
            ),		   
  ).order_by('b')

  for i in qs:
    print(i.row, i.a, i.b, i.previous, i.next_val)
            
        

OUTPUT -

Q1 1 False None 3
Q4 3 False 1 6
Q7 6 False 3 6
Q8 6 False 6 None
Q2 2 True None 3
Q3 3 True 2 3
Q5 3 True 3 4
Q6 4 True 3 7
Q9 7 True 4 None

This is not the end of window functions. There are more interesting and useful functions coming up in the next part of this window function blog series.

… to be continued

Back To Blogs


contact us