暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

云贝教育 |【技术文章】PostgreSQL 中的高级SQL 探秘:掌握窗口函数

云贝教育 2024-06-06
240

本文为云贝教育 刘峰 原创,请尊重知识产权,转发请注明出处,不接受任何抄袭、演绎和未经注明出处的转载。


在数据库技术的殿堂中,PostgreSQL(简称PG)以其开源、高效、稳定、功能丰富著称誉满载誉全球。在PG的众多特性中,窗函数(Window Functions)是SQL的明珠,为数据分析与处理提供了无与强大的武器。本文将深入浅出PG中的窗函数,引领您探索其精妙用法,解锁数据分析的新天地。


一、什么是窗函数?

窗函数,顾名思义,是在数据集上定义的一个“窗口”上执行的函数。不同于常规聚合函数对数据整体操作,窗函数可以在每行保持个体性的同时,考虑行间关系,实现动态地计算。这使得在分组排名、滚动统计、区间分析、移动平均等领域尤为得心应手。
引用官方文档的说明

函数描述

row_number () → bigint
返回其分区内当前行的编号,从 1 开始计数。

rank () → bigint

返回当前行的排名,有间隙;即,其对等组中第一行的 row_number。

dense_rank () → bigint

返回当前行的排名,无间隙;该功能可以有效地统计同级组的数量。

percent_rank () → double precision

返回当前行的相对排名,即 (rank - 1) / (分区总行数 - 1)。因此,该值的范围为 0 到 1(含 0 和 1)。

cume_dist () → double precision

返回累积分布,即(当前行之前或与当前行同级的分区行数)/(总分区行数)。因此该值的范围是从 1/N 到 1。

ntile ( num_bucketsinteger ) → integer

返回一个从 1 到参数值的整数,尽可能均等地划分分区。

lag ( value anycompatible [, offset integer [, default anycompatible ]] )  anycompatible

返回在分区内当前行之前的偏移行处计算的值;如果没有这样的行,则返回默认值(它必须是与值兼容的类型)。偏移量和默认值都是相对于当前行进行评估的。如果省略,则 offset 默认为 1,默认为 NULL。

lead ( value anycompatible [, offset integer [, default anycompatible ]] )  anycompatible

返回在分区内当前行之后的偏移行处计算的值;如果没有这样的行,则返回默认值(它必须是与值兼容的类型)。偏移量和默认值都是相对于当前行进行评估的。如果省略,则 offset 默认为 1,默认为 NULL。

first_value ( value anyelement ) → anyelement

返回在窗口框架第一行的行处计算的值。

last_value ( value anyelement ) → anyelement

返回在窗口框架的最后一行处计算的值。

nth_value ( value anyelement, n integer ) → anyelement

返回在窗口框架的第 n 行(从 1 开始计数)的行处计算的值;如果没有这样的行,则返回 NULL。


二、窗口函数的语法:


窗口函数调用表示对查询选择的行的某些部分应用类似聚合的函数。与非窗口聚合调用不同,这并不涉及将所选行分组为单个输出行 - 每行在查询输出中保持独立。但是,根据窗口函数调用的分组规范(PARTITION BY 列表),窗口函数可以访问属于当前行组的所有行。窗口函数调用的语法是以下之一:
function_name ([expression [, expression ... ]]) [ FILTER ( WHERE filter_clause ) ] OVER window_name
function_name ([expression [, expression ... ]]) [ FILTER ( WHERE filter_clause ) ] OVER ( window_definition )
function_name ( * ) [ FILTER ( WHERE filter_clause ) ] OVER window_name
function_name ( * ) [ FILTER ( WHERE filter_clause ) ] OVER ( window_definition )


其中 window_definition 的语法如下
[ existing_window_name ]
[ PARTITION BY expression [, ...] ]
[ ORDER BY expression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ]
[ frame_clause ]


可选的frame_clause可以是以下之一
{ RANGE | ROWS | GROUPS } frame_start [ frame_exclusion ]
{ RANGE | ROWS | GROUPS } BETWEEN frame_start AND frame_end [ frame_exclusion ]


其中frame_start和frame_end可以是其中之一。
UNBOUNDED PRECEDING
offset PRECEDING
CURRENT ROW
offset FOLLOWING
UNBOUNDED FOLLOWING


和frame_exclusion 可以是其中之一
EXCLUDE CURRENT ROW
EXCLUDE GROUP
EXCLUDE TIES
EXCLUDE NO OTHERS


三、窗口函数的使用示例:


3.1 模拟示例表
以下示例在测试环境中运行,请勿在生产环境运行。
drop table t2 ;

create table t2(id int,subject text,name text, score int);

insert into t2 select generate_series(1,10),'math','AA'||generate_series(1,10),floor(random()*100::int);
insert into t2 select generate_series(11,20),'english','AA'||generate_series(1,10),floor(random()*100::int);
insert into t2 select generate_series(21,30),'article','AA'||generate_series(1,10),floor(random()*100::int);
insert into t2 select generate_series(31,40),'sport','AA'||generate_series(1,10),floor(random()*100::int);


3.2 查看示例数据
drop table t2 ;

create table t2(id int,subject text,name text, score int);

insert into t2 select generate_series(1,10),'math','AA'||generate_series(1,10),floor(random()*100::int);
insert into t2 select generate_series(11,20),'english','AA'||generate_series(1,10),floor(random()*100::int);
insert into t2 select generate_series(21,30),'article','AA'||generate_series(1,10),floor(random()*100::int);
insert into t2 select generate_series(31,40),'sport','AA'||generate_series(1,10),floor(random()*100::int);


3.3 number_rank()查询每一科的学生成绩编号
select row_number() over (PARTITION BY subject order by score),* from t2 ;
 row_number | id | subject | name | score 
------------+----+---------+------+-------
          1 | 26 | article | AA6  |     0
          2 | 27 | article | AA7  |     7
          3 | 28 | article | AA8  |    13
          4 | 25 | article | AA5  |    26
          5 | 24 | article | AA4  |    30
          6 | 21 | article | AA1  |    45
          7 | 30 | article | AA10 |    45
          8 | 22 | article | AA2  |    55
          9 | 29 | article | AA9  |    69
         10 | 23 | article | AA3  |    97
          1 | 14 | english | AA4  |     0
          2 | 13 | english | AA3  |     4
          3 | 18 | english | AA8  |     7
          4 | 20 | english | AA10 |    38
          5 | 15 | english | AA5  |    38
          6 | 16 | english | AA6  |    41
          7 | 12 | english | AA2  |    56
          8 | 17 | english | AA7  |    70
          9 | 11 | english | AA1  |    74
         10 | 19 | english | AA9  |    77
          1 |  1 | math    | AA1  |    10
          2 |  8 | math    | AA8  |    19
          3 |  4 | math    | AA4  |    29
          4 |  7 | math    | AA7  |    36
          5 |  5 | math    | AA5  |    52
          6 |  6 | math    | AA6  |    68
          7 |  9 | math    | AA9  |    71
          8 |  3 | math    | AA3  |    72
          9 | 10 | math    | AA10 |    75
         10 |  2 | math    | AA2  |    89
          1 | 39 | sport   | AA9  |     6
          2 | 31 | sport   | AA1  |    19
          3 | 33 | sport   | AA3  |    25
          4 | 36 | sport   | AA6  |    28
          5 | 34 | sport   | AA4  |    32
          6 | 37 | sport   | AA7  |    33
          7 | 38 | sport   | AA8  |    50
          8 | 35 | sport   | AA5  |    62
          9 | 40 | sport   | AA10 |    73
         10 | 32 | sport   | AA2  |    81
(40 rows)


3.4 avg()查询每一科的平均成绩
postgres=# select subject,name,score,avg(score) over(PARTITION BY subject) from t2;
 subject | name | score |         avg         
---------+------+-------+---------------------
 article | AA1  |    45 | 38.7000000000000000
 article | AA2  |    55 | 38.7000000000000000
 article | AA3  |    97 | 38.7000000000000000
 article | AA4  |    30 | 38.7000000000000000
 article | AA5  |    26 | 38.7000000000000000
 article | AA6  |     0 | 38.7000000000000000
 article | AA7  |     7 | 38.7000000000000000
 article | AA8  |    13 | 38.7000000000000000
 article | AA9  |    69 | 38.7000000000000000
 article | AA10 |    45 | 38.7000000000000000
 english | AA1  |    74 | 40.5000000000000000
 english | AA2  |    56 | 40.5000000000000000
 english | AA3  |     4 | 40.5000000000000000
 english | AA4  |     0 | 40.5000000000000000
 english | AA5  |    38 | 40.5000000000000000
 english | AA6  |    41 | 40.5000000000000000
 english | AA7  |    70 | 40.5000000000000000
 english | AA8  |     7 | 40.5000000000000000
 english | AA9  |    77 | 40.5000000000000000
 english | AA10 |    38 | 40.5000000000000000
 math    | AA1  |    10 | 52.1000000000000000
 math    | AA2  |    89 | 52.1000000000000000
 math    | AA3  |    72 | 52.1000000000000000
 math    | AA4  |    29 | 52.1000000000000000
 math    | AA5  |    52 | 52.1000000000000000
 math    | AA6  |    68 | 52.1000000000000000
 math    | AA7  |    36 | 52.1000000000000000
 math    | AA8  |    19 | 52.1000000000000000
 math    | AA9  |    71 | 52.1000000000000000
 math    | AA10 |    75 | 52.1000000000000000
 sport   | AA1  |    19 | 40.9000000000000000
 sport   | AA2  |    81 | 40.9000000000000000
 sport   | AA3  |    25 | 40.9000000000000000
 sport   | AA4  |    32 | 40.9000000000000000
 sport   | AA5  |    62 | 40.9000000000000000
 sport   | AA6  |    28 | 40.9000000000000000
 sport   | AA7  |    33 | 40.9000000000000000
 sport   | AA8  |    50 | 40.9000000000000000
 sport   | AA9  |     6 | 40.9000000000000000
 sport   | AA10 |    73 | 40.9000000000000000
(40 rows)


3.5 rank()查询每一科的学生成绩排名(包括相同成绩的显示)
dense_rank()与rank()相反
postgres=# select rank() over (PARTITION BY subject order by score),* from t2 ;  
 rank | id | subject | name | score 
------+----+---------+------+-------
    1 | 26 | article | AA6  |     0
    2 | 27 | article | AA7  |     7
    3 | 28 | article | AA8  |    13
    4 | 25 | article | AA5  |    26
    5 | 24 | article | AA4  |    30
    6 | 21 | article | AA1  |    45  #--成绩一样的场景
    6 | 30 | article | AA10 |    45  #--成绩一样的场景
    8 | 22 | article | AA2  |    55
    9 | 29 | article | AA9  |    69
   10 | 23 | article | AA3  |    97
    1 | 14 | english | AA4  |     0
    2 | 13 | english | AA3  |     4
    3 | 18 | english | AA8  |     7
    4 | 20 | english | AA10 |    38
    4 | 15 | english | AA5  |    38
    6 | 16 | english | AA6  |    41
    7 | 12 | english | AA2  |    56
    8 | 17 | english | AA7  |    70
    9 | 11 | english | AA1  |    74
   10 | 19 | english | AA9  |    77
    1 |  1 | math    | AA1  |    10
    2 |  8 | math    | AA8  |    19
    3 |  4 | math    | AA4  |    29
    4 |  7 | math    | AA7  |    36
    5 |  5 | math    | AA5  |    52
    6 |  6 | math    | AA6  |    68
    7 |  9 | math    | AA9  |    71
    8 |  3 | math    | AA3  |    72
    9 | 10 | math    | AA10 |    75
   10 |  2 | math    | AA2  |    89
    1 | 39 | sport   | AA9  |     6
    2 | 31 | sport   | AA1  |    19
    3 | 33 | sport   | AA3  |    25
    4 | 36 | sport   | AA6  |    28
    5 | 34 | sport   | AA4  |    32
    6 | 37 | sport   | AA7  |    33
    7 | 38 | sport   | AA8  |    50
    8 | 35 | sport   | AA5  |    62
    9 | 40 | sport   | AA10 |    73
   10 | 32 | sport   | AA2  |    81
(40 rows)


3.6 lag()查前后偏移行数的值
postgres=# select lag(id,1) over (),* from t2 ;
 lag | id | subject | name | score 
-----+----+---------+------+-------
     |  1 | math    | AA1  |    10
   1 |  2 | math    | AA2  |    89
   2 |  3 | math    | AA3  |    72
   3 |  4 | math    | AA4  |    29
   4 |  5 | math    | AA5  |    52
   5 |  6 | math    | AA6  |    68
   6 |  7 | math    | AA7  |    36
   7 |  8 | math    | AA8  |    19
   8 |  9 | math    | AA9  |    71
   9 | 10 | math    | AA10 |    75
  10 | 11 | english | AA1  |    74
  11 | 12 | english | AA2  |    56
  12 | 13 | english | AA3  |     4
  13 | 14 | english | AA4  |     0
  14 | 15 | english | AA5  |    38
  15 | 16 | english | AA6  |    41
  16 | 17 | english | AA7  |    70
  17 | 18 | english | AA8  |     7
  18 | 19 | english | AA9  |    77
  19 | 20 | english | AA10 |    38
  20 | 21 | article | AA1  |    45
  21 | 22 | article | AA2  |    55
  22 | 23 | article | AA3  |    97
  23 | 24 | article | AA4  |    30
  24 | 25 | article | AA5  |    26
  25 | 26 | article | AA6  |     0
  26 | 27 | article | AA7  |     7
  27 | 28 | article | AA8  |    13
  28 | 29 | article | AA9  |    69
  29 | 30 | article | AA10 |    45
  30 | 31 | sport   | AA1  |    19
  31 | 32 | sport   | AA2  |    81
  32 | 33 | sport   | AA3  |    25
  33 | 34 | sport   | AA4  |    32
  34 | 35 | sport   | AA5  |    62
  35 | 36 | sport   | AA6  |    28
  36 | 37 | sport   | AA7  |    33
  37 | 38 | sport   | AA8  |    50
  38 | 39 | sport   | AA9  |     6
  39 | 40 | sport   | AA10 |    73
(40 rows)


3.7 first_value()查看分组第一个值
这里分组的第一个值由是否使用order by 决定

  • 无order by
postgres=# select  first_value(score) over (PARTITION BY subject ),* from t2 ; 
 first_value | id | subject | name | score 
-------------+----+---------+------+-------
          45 | 21 | article | AA1  |    45
          45 | 22 | article | AA2  |    55
          45 | 23 | article | AA3  |    97
          45 | 24 | article | AA4  |    30
          45 | 25 | article | AA5  |    26
          45 | 26 | article | AA6  |     0
          45 | 27 | article | AA7  |     7
          45 | 28 | article | AA8  |    13
          45 | 29 | article | AA9  |    69
          45 | 30 | article | AA10 |    45
          74 | 11 | english | AA1  |    74
          74 | 12 | english | AA2  |    56
          74 | 13 | english | AA3  |     4
          74 | 14 | english | AA4  |     0
          74 | 15 | english | AA5  |    38
          74 | 16 | english | AA6  |    41
          74 | 17 | english | AA7  |    70
          74 | 18 | english | AA8  |     7
          74 | 19 | english | AA9  |    77
          74 | 20 | english | AA10 |    38
          10 |  1 | math    | AA1  |    10
          10 |  2 | math    | AA2  |    89
          10 |  3 | math    | AA3  |    72
          10 |  4 | math    | AA4  |    29
          10 |  5 | math    | AA5  |    52
          10 |  6 | math    | AA6  |    68
          10 |  7 | math    | AA7  |    36
          10 |  8 | math    | AA8  |    19
          10 |  9 | math    | AA9  |    71
          10 | 10 | math    | AA10 |    75
          19 | 31 | sport   | AA1  |    19
          19 | 32 | sport   | AA2  |    81
          19 | 33 | sport   | AA3  |    25
          19 | 34 | sport   | AA4  |    32
          19 | 35 | sport   | AA5  |    62
          19 | 36 | sport   | AA6  |    28
          19 | 37 | sport   | AA7  |    33
          19 | 38 | sport   | AA8  |    50
          19 | 39 | sport   | AA9  |     6
          19 | 40 | sport   | AA10 |    73
(40 rows)


有order by

postgres=# select  first_value(score) over (PARTITION BY subject ),* from t2 ; 
 first_value | id | subject | name | score 
-------------+----+---------+------+-------
          45 | 21 | article | AA1  |    45
          45 | 22 | article | AA2  |    55
          45 | 23 | article | AA3  |    97
          45 | 24 | article | AA4  |    30
          45 | 25 | article | AA5  |    26
          45 | 26 | article | AA6  |     0
          45 | 27 | article | AA7  |     7
          45 | 28 | article | AA8  |    13
          45 | 29 | article | AA9  |    69
          45 | 30 | article | AA10 |    45
          74 | 11 | english | AA1  |    74
          74 | 12 | english | AA2  |    56
          74 | 13 | english | AA3  |     4
          74 | 14 | english | AA4  |     0
          74 | 15 | english | AA5  |    38
          74 | 16 | english | AA6  |    41
          74 | 17 | english | AA7  |    70
          74 | 18 | english | AA8  |     7
          74 | 19 | english | AA9  |    77
          74 | 20 | english | AA10 |    38
          10 |  1 | math    | AA1  |    10
          10 |  2 | math    | AA2  |    89
          10 |  3 | math    | AA3  |    72
          10 |  4 | math    | AA4  |    29
          10 |  5 | math    | AA5  |    52
          10 |  6 | math    | AA6  |    68
          10 |  7 | math    | AA7  |    36
          10 |  8 | math    | AA8  |    19
          10 |  9 | math    | AA9  |    71
          10 | 10 | math    | AA10 |    75
          19 | 31 | sport   | AA1  |    19
          19 | 32 | sport   | AA2  |    81
          19 | 33 | sport   | AA3  |    25
          19 | 34 | sport   | AA4  |    32
          19 | 35 | sport   | AA5  |    62
          19 | 36 | sport   | AA6  |    28
          19 | 37 | sport   | AA7  |    33
          19 | 38 | sport   | AA8  |    50
          19 | 39 | sport   | AA9  |     6
          19 | 40 | sport   | AA10 |    73
(40 rows)

last_value()分组最后一个值,与last_value()相反。


3.8 nth_value()查看分组指定值
postgres=# select  nth_value(score,3) over (PARTITION BY subject ),* from t2 ;   
 nth_value | id | subject | name | score 
-----------+----+---------+------+-------
        97 | 21 | article | AA1  |    45
        97 | 22 | article | AA2  |    55
        97 | 23 | article | AA3  |    97
        97 | 24 | article | AA4  |    30
        97 | 25 | article | AA5  |    26
        97 | 26 | article | AA6  |     0
        97 | 27 | article | AA7  |     7
        97 | 28 | article | AA8  |    13
        97 | 29 | article | AA9  |    69
        97 | 30 | article | AA10 |    45
         4 | 11 | english | AA1  |    74
         4 | 12 | english | AA2  |    56
         4 | 13 | english | AA3  |     4
         4 | 14 | english | AA4  |     0
         4 | 15 | english | AA5  |    38
         4 | 16 | english | AA6  |    41
         4 | 17 | english | AA7  |    70
         4 | 18 | english | AA8  |     7
         4 | 19 | english | AA9  |    77
         4 | 20 | english | AA10 |    38
        72 |  1 | math    | AA1  |    10
        72 |  2 | math    | AA2  |    89
        72 |  3 | math    | AA3  |    72
        72 |  4 | math    | AA4  |    29
        72 |  5 | math    | AA5  |    52
        72 |  6 | math    | AA6  |    68
        72 |  7 | math    | AA7  |    36
        72 |  8 | math    | AA8  |    19
        72 |  9 | math    | AA9  |    71
        72 | 10 | math    | AA10 |    75
        25 | 31 | sport   | AA1  |    19
        25 | 32 | sport   | AA2  |    81
        25 | 33 | sport   | AA3  |    25
        25 | 34 | sport   | AA4  |    32
        25 | 35 | sport   | AA5  |    62
        25 | 36 | sport   | AA6  |    28
        25 | 37 | sport   | AA7  |    33
        25 | 38 | sport   | AA8  |    50
        25 | 39 | sport   | AA9  |     6
        25 | 40 | sport   | AA10 |    73
(40 rows)


3.9 partition by使用别名
postgres=# select  avg(score) over(tmp),sum(score) over(tmp) ,* from t2 window tmp as (PARTITION BY subject);    
         avg         | sum | id | subject | name | score 
---------------------+-----+----+---------+------+-------
 38.7000000000000000 | 387 | 21 | article | AA1  |    45
 38.7000000000000000 | 387 | 22 | article | AA2  |    55
 38.7000000000000000 | 387 | 23 | article | AA3  |    97
 38.7000000000000000 | 387 | 24 | article | AA4  |    30
 38.7000000000000000 | 387 | 25 | article | AA5  |    26
 38.7000000000000000 | 387 | 26 | article | AA6  |     0
 38.7000000000000000 | 387 | 27 | article | AA7  |     7
 38.7000000000000000 | 387 | 28 | article | AA8  |    13
 38.7000000000000000 | 387 | 29 | article | AA9  |    69
 38.7000000000000000 | 387 | 30 | article | AA10 |    45
 40.5000000000000000 | 405 | 11 | english | AA1  |    74
 40.5000000000000000 | 405 | 12 | english | AA2  |    56
 40.5000000000000000 | 405 | 13 | english | AA3  |     4
 40.5000000000000000 | 405 | 14 | english | AA4  |     0
 40.5000000000000000 | 405 | 15 | english | AA5  |    38
 40.5000000000000000 | 405 | 16 | english | AA6  |    41
 40.5000000000000000 | 405 | 17 | english | AA7  |    70
 40.5000000000000000 | 405 | 18 | english | AA8  |     7
 40.5000000000000000 | 405 | 19 | english | AA9  |    77
 40.5000000000000000 | 405 | 20 | english | AA10 |    38
 52.1000000000000000 | 521 |  1 | math    | AA1  |    10
 52.1000000000000000 | 521 |  2 | math    | AA2  |    89
 52.1000000000000000 | 521 |  3 | math    | AA3  |    72
 52.1000000000000000 | 521 |  4 | math    | AA4  |    29
 52.1000000000000000 | 521 |  5 | math    | AA5  |    52
 52.1000000000000000 | 521 |  6 | math    | AA6  |    68
 52.1000000000000000 | 521 |  7 | math    | AA7  |    36
 52.1000000000000000 | 521 |  8 | math    | AA8  |    19
 52.1000000000000000 | 521 |  9 | math    | AA9  |    71
 52.1000000000000000 | 521 | 10 | math    | AA10 |    75
 40.9000000000000000 | 409 | 31 | sport   | AA1  |    19
 40.9000000000000000 | 409 | 32 | sport   | AA2  |    81
 40.9000000000000000 | 409 | 33 | sport   | AA3  |    25
 40.9000000000000000 | 409 | 34 | sport   | AA4  |    32
 40.9000000000000000 | 409 | 35 | sport   | AA5  |    62
 40.9000000000000000 | 409 | 36 | sport   | AA6  |    28
 40.9000000000000000 | 409 | 37 | sport   | AA7  |    33
 40.9000000000000000 | 409 | 38 | sport   | AA8  |    50
 40.9000000000000000 | 409 | 39 | sport   | AA9  |     6
 40.9000000000000000 | 409 | 40 | sport   | AA10 |    73
(40 rows)


四、优化中的使用示例

在SQL优化中,有些表被多次调用的情况下,可以通过窗口函数减少表的扫描次数

例如:查看每科的平均成绩


4.1 不使用窗口函数

postgres=# select a.subject,a.name,score,tmp.avgsore from t2 a left join (select b.subject,avg(score) as avgsore from t2 b group by b.subject) tmp on a.subject=tmp.subject order by a.subject; 
 subject | name | score |       avgsore       
---------+------+-------+---------------------
 article | AA1  |    45 | 38.7000000000000000
 article | AA2  |    55 | 38.7000000000000000
 article | AA3  |    97 | 38.7000000000000000
 article | AA4  |    30 | 38.7000000000000000
 article | AA5  |    26 | 38.7000000000000000
 article | AA6  |     0 | 38.7000000000000000
 article | AA7  |     7 | 38.7000000000000000
 article | AA8  |    13 | 38.7000000000000000
 article | AA9  |    69 | 38.7000000000000000
 article | AA10 |    45 | 38.7000000000000000
 english | AA1  |    74 | 40.5000000000000000
 english | AA2  |    56 | 40.5000000000000000
 english | AA3  |     4 | 40.5000000000000000
 english | AA4  |     0 | 40.5000000000000000
 english | AA5  |    38 | 40.5000000000000000
 english | AA6  |    41 | 40.5000000000000000
 english | AA7  |    70 | 40.5000000000000000
 english | AA8  |     7 | 40.5000000000000000
 english | AA9  |    77 | 40.5000000000000000
 english | AA10 |    38 | 40.5000000000000000
 math    | AA1  |    10 | 52.1000000000000000
 math    | AA2  |    89 | 52.1000000000000000
 math    | AA3  |    72 | 52.1000000000000000
 math    | AA4  |    29 | 52.1000000000000000
 math    | AA5  |    52 | 52.1000000000000000
 math    | AA6  |    68 | 52.1000000000000000
 math    | AA7  |    36 | 52.1000000000000000
 math    | AA8  |    19 | 52.1000000000000000
 math    | AA9  |    71 | 52.1000000000000000
 math    | AA10 |    75 | 52.1000000000000000
 sport   | AA1  |    19 | 40.9000000000000000
 sport   | AA2  |    81 | 40.9000000000000000
 sport   | AA3  |    25 | 40.9000000000000000
 sport   | AA4  |    32 | 40.9000000000000000
 sport   | AA5  |    62 | 40.9000000000000000
 sport   | AA6  |    28 | 40.9000000000000000
 sport   | AA7  |    33 | 40.9000000000000000
 sport   | AA8  |    50 | 40.9000000000000000
 sport   | AA9  |     6 | 40.9000000000000000
 sport   | AA10 |    73 | 40.9000000000000000
(40 rows)


执行计划如下

                                    QUERY PLAN                               
       
-------------------------------------------------------------------------------
 Sort  (cost=88.55..90.58 rows=810 width=100)
   Sort Key: a.subject
   ->  Hash Left Join  (cost=29.15..49.42 rows=810 width=100)
         Hash Cond: (a.subject = tmp.subject)
         ->  Seq Scan on t2 a  (cost=0.00..18.10 rows=810 width=68)
         ->  Hash  (cost=26.65..26.65 rows=200 width=64)
               ->  Subquery Scan on tmp  (cost=22.15..26.65 rows=200 width=64)
                     ->  HashAggregate  (cost=22.15..24.65 rows=200 width=64)
                           Group Key: b.subject
                           ->  Seq Scan on t2 b  (cost=0.00..18.10 rows=810 width=36)
(10 rows)


4.2 不使用窗口函数
postgres=# select a.subject,a.name,score,tmp.avgsore from t2 a left join (select b.subject,avg(score) as avgsore from t2 b group by b.subject) tmp on a.subject=tmp.subject order by a.subject; 
 subject | name | score |       avgsore       
---------+------+-------+---------------------
 article | AA1  |    45 | 38.7000000000000000
 article | AA2  |    55 | 38.7000000000000000
 article | AA3  |    97 | 38.7000000000000000
 article | AA4  |    30 | 38.7000000000000000
 article | AA5  |    26 | 38.7000000000000000
 article | AA6  |     0 | 38.7000000000000000
 article | AA7  |     7 | 38.7000000000000000
 article | AA8  |    13 | 38.7000000000000000
 article | AA9  |    69 | 38.7000000000000000
 article | AA10 |    45 | 38.7000000000000000
 english | AA1  |    74 | 40.5000000000000000
 english | AA2  |    56 | 40.5000000000000000
 english | AA3  |     4 | 40.5000000000000000
 english | AA4  |     0 | 40.5000000000000000
 english | AA5  |    38 | 40.5000000000000000
 english | AA6  |    41 | 40.5000000000000000
 english | AA7  |    70 | 40.5000000000000000
 english | AA8  |     7 | 40.5000000000000000
 english | AA9  |    77 | 40.5000000000000000
 english | AA10 |    38 | 40.5000000000000000
 math    | AA1  |    10 | 52.1000000000000000
 math    | AA2  |    89 | 52.1000000000000000
 math    | AA3  |    72 | 52.1000000000000000
 math    | AA4  |    29 | 52.1000000000000000
 math    | AA5  |    52 | 52.1000000000000000
 math    | AA6  |    68 | 52.1000000000000000
 math    | AA7  |    36 | 52.1000000000000000
 math    | AA8  |    19 | 52.1000000000000000
 math    | AA9  |    71 | 52.1000000000000000
 math    | AA10 |    75 | 52.1000000000000000
 sport   | AA1  |    19 | 40.9000000000000000
 sport   | AA2  |    81 | 40.9000000000000000
 sport   | AA3  |    25 | 40.9000000000000000
 sport   | AA4  |    32 | 40.9000000000000000
 sport   | AA5  |    62 | 40.9000000000000000
 sport   | AA6  |    28 | 40.9000000000000000
 sport   | AA7  |    33 | 40.9000000000000000
 sport   | AA8  |    50 | 40.9000000000000000
 sport   | AA9  |     6 | 40.9000000000000000
 sport   | AA10 |    73 | 40.9000000000000000
(40 rows)


执行计划如下

                             QUERY PLAN                            
------------------------------------------------------------------
 WindowAgg  (cost=57.23..71.41 rows=810 width=100)
   ->  Sort  (cost=57.23..59.26 rows=810 width=68)
         Sort Key: subject
         ->  Seq Scan on t2  (cost=0.00..18.10 rows=810 width=68)
(4 rows)

通过对比两个SQL的语义和结果,可以确认两者等价。但执行计划显示,第一个SQL对T2表扫描两次,而第二个SQL对T2表扫描一次,那必然是T2的执行计划更优。


五、总结

简而言之,窗口函数极大地扩展了SQL的表达能力,使数据处理更加灵活和精细,特别是在复杂数据分析任务中,它能够直接在数据库层面解决很多原本需要多层迭代或程序逻辑的问题,提高效率并简化数据处理流程。


想了解更多PostgreSQL相关的学习资料(技术文章和视频),可以微信公众号或B站搜索《云贝教育》,免费获取。

想了解更多PostgreSQL相关的学习资料(技术文章和视频),可以微信公众号或B站搜索《云贝教育》,免费获取。

想了解更多PostgreSQL相关的学习资料(技术文章和视频),可以微信公众号或B站搜索《云贝教育》,免费获取。

「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论