PostgreSQL求解连续数段问题

原创贺晓群 2020-06-29

5448

有同事问我是否可以用一条SQL实现以下需求：社保缴存在换工作时可能会有中断，现在需要查询出每个人连续缴存起始月、终止月和月数，如果有中断需要把多个分段同时求出，这个可以使用PG的窗口函数实现，下面给出一个具体的实例。

创建测试表test，总共4个字段，name为姓名，jfyear社保缴存的年份，jfmonth社保缴存的月份，ispay是否已缴存（值为“已到账”代表已缴存，其它值为未缴存）。

postgres=# CREATE TABLE test(name VARCHAR(64),jfyear VARCHAR(4),jfmonth VARCHAR(2),ispay VARCHAR(3));
CREATE TABLE

postgres=# INSERT INTO test
postgres-#     (NAME, jfyear, jfmonth, ispay)
postgres-# VALUES
postgres-#     ('张三', '2020', '05', '未到账'),
postgres-#     ('张三', '2020', '04', '已到账'),
postgres-#     ('张三', '2020', '03', '已到账'),
postgres-#     ('张三', '2020', '02', '已到账'),
postgres-#     ('张三', '2020', '01', '已到账'),
postgres-#     ('张三', '2019', '12', '已到账'),
postgres-#     ('张三', '2019', '11', NULL),
postgres-#     ('张三', '2019', '10', '未到账'),
postgres-#     ('张三', '2019', '09', '已到账'),
postgres-#     ('李四', '2020', '05', NULL),
postgres-#     ('李四', '2020', '04', '已到账'),
postgres-#     ('李四', '2020', '03', '已到账'),
postgres-#     ('李四', '2020', '02', NULL),
postgres-#     ('李四', '2020', '01', '已到账'),
postgres-#     ('李四', '2019', '12', '已到账'),
postgres-#     ('李四', '2019', '11', NULL),
postgres-#     ('李四', '2019', '10', '未到账'),
postgres-#     ('李四', '2019', '09', '已到账'),
postgres-#     ('李四', '2019', '08', '已到账'),
postgres-#     ('李四', '2019', '07', '已到账'),
postgres-#     ('李四', '2019', '06', '已到账');
INSERT 0 21
postgres=# SELECT * FROM test;
 name | jfyear | jfmonth | ispay  
------+--------+---------+--------
 张三 | 2020   | 05      | 未到账
 张三 | 2020   | 04      | 已到账
 张三 | 2020   | 03      | 已到账
 张三 | 2020   | 02      | 已到账
 张三 | 2020   | 01      | 已到账
 张三 | 2019   | 12      | 已到账
 张三 | 2019   | 11      | 
 张三 | 2019   | 10      | 未到账
 张三 | 2019   | 09      | 已到账
 李四 | 2020   | 05      | 
 李四 | 2020   | 04      | 已到账
 李四 | 2020   | 03      | 已到账
 李四 | 2020   | 02      | 
 李四 | 2020   | 01      | 已到账
 李四 | 2019   | 12      | 已到账
 李四 | 2019   | 11      | 
 李四 | 2019   | 10      | 未到账
 李四 | 2019   | 09      | 已到账
 李四 | 2019   | 08      | 已到账
 李四 | 2019   | 07      | 已到账
 李四 | 2019   | 06      | 已到账
(21 rows)

postgres=# SELECT name,
postgres-#        e AS start_mon, --连续数的开始月份
postgres-#        lead(s, 1, maxn) over(PARTITION BY name ORDER BY e) end_mon, --连续数的结束月份
postgres-#        extract(YEAR FROM age(to_date(lead(s, 1, maxn) over(PARTITION BY name ORDER BY e), 'yyyymm'), to_date(e, 'yyyymm'))) * 12 +
postgres-#        extract(MONTH FROM age(to_date(lead(s, 1, maxn) over(PARTITION BY name ORDER BY e), 'yyyymm'), to_date(e, 'yyyymm'))) + 1 mons --此连续段连续缴存总月份数
postgres-#   FROM (SELECT name,
postgres(#                lag(jfyear || jfmonth, 1) over(PARTITION BY name ORDER BY jfyear || jfmonth) s, --偏移查询，查询当前行前一行数据
postgres(#                jfyear || jfmonth e, --拼接年月
postgres(#                MAX(jfyear || jfmonth) over(PARTITION BY name) maxn --查询每个人最大的缴存月份
postgres(#           FROM test
postgres(#          WHERE ispay = '已到账') a --查询已缴存月份，这样就会产生分段
postgres-#  WHERE coalesce(age(to_date(e, 'yyyymm'), to_date(s, 'yyyymm')) - INTERVAL '1 mon', INTERVAL '1 mon') <> INTERVAL '00:00:00' --这里是关键，过滤掉连续的数据行
postgres-#  ORDER BY name,e;
 name | start_mon | end_mon | mons 
------+-----------+---------+------
 李四 | 201906    | 201909  |    4
 李四 | 201912    | 202001  |    2
 李四 | 202003    | 202004  |    2
 张三 | 201909    | 201909  |    1
 张三 | 201912    | 202004  |    5
(5 rows)

postgresql

最后修改时间：2020-06-29 15:33:11

「喜欢这篇文章，您的关注和赞赏是给作者最好的鼓励」

关注作者

PostgreSQL求解连续数段问题

评论