作者
digoal
日期
2021-09-15
标签
PostgreSQL , Oracle , skip index scan , cte , recursive , 递归
递归是PG 8.4开始支持的语法, 在某些场景非常有效, 例如图式搜索, 树型查询, 模拟skip index scan的唯一值加速等.
《PostgrSQL 递归SQL的几个应用 - 极客与正常人的思维》
但是使用CTE时你可能会遇到一些不支持的语法, 不要方章, 我这里给出了解决方案.
下面这个例子是求t_log表里有多少个gid.
create table t_log (id int8 primary key, gid int, info text, crt_time timestamp); insert into t_log select generate_series(1,10000000), random()*1000, md5(random()::text), clock_timestamp(); create index idx_t_log_1 on t_log (gid);
复制
1、错误1, limit 1不能出现在union中, 包起来就可以了
with recursive tmp as ( select gid from t_log order by gid limit 1 union all select gid from t_log where gid > (select gid from tmp) order by gid limit 1 ) select count(*) from tmp; ERROR: syntax error at or near "union" LINE 3: union all ^
复制
2、错误2, 递归表不能自包含在递归子句的字查询中.
使用lateral JOIN解决. 或
使用select 中的sub query消除where中的子查询
with recursive tmp as ( -- 递归表tmp select gid from (select gid from t_log order by gid limit 1) t union all select gid from t_log where gid > (select gid from tmp) order by gid limit 1 -- 递归子句, 子查询内包含递归表tmp. ) select count(*) from tmp; ERROR: recursive reference to query "tmp" must not appear within a subquery LINE 4: ...elect gid from t_log where gid > (select gid from tmp) order... ^
复制
3、错误3, 递归子句不支持order by
用聚合代替order by
或, 把order by包起来
with recursive tmp as ( select gid from (select gid from t_log order by gid limit 1) t union all select t_log.gid from t_log join tmp on (t_log.gid > tmp.gid) order by t_log.gid limit 1 -- 递归子句包含order by ) select count(*) from tmp; ERROR: ORDER BY in a recursive query is not implemented LINE 4: ... t_log join tmp on (t_log.gid > tmp.gid) order by t_log.gid ... ^
复制
4、错误4, 递归子句不支持聚合
可以把聚合放在select 中的sub query
with recursive tmp as ( select gid from (select gid from t_log order by gid limit 1) t union all select min(t_log.gid) from t_log join tmp on (t_log.gid > tmp.gid) -- 递归子句, 不支持聚合 ) select count(*) from tmp; ERROR: aggregate functions are not allowed in a recursive query's recursive term LINE 4: select min(t_log.gid) from t_log join tmp on (t_log.gid > tm... ^
复制
解决方法1:
使用cross join lateral语法, 同时把order by、关联过滤放在sub query内. 这个方法会使用nest loop join, 需要注意子句的效率
with recursive tmp as ( select gid from (select gid from t_log order by gid limit 1) t union all select l.gid from tmp CROSS join lateral (select t_log.gid from t_log where t_log.gid>tmp.gid order by t_log.gid limit 1) l -- nest loop join, 注意子句的查询效率 ) select count(*) from tmp;
复制
解决方法2:
使用select内的sub query, 聚合放在sub query内. 这个方法需要注意的点: select子查询 只能返回单列值, 同时可能返回null, 需要过滤一下.
with recursive tmp as ( ( select gid from (select gid from t_log order by gid limit 1) t ) union all ( select (select min(t.gid) from t_log t where t.gid > s.gid) -- 这个形式的子查询 只能返回单列值, 如果有多列需要拼装成复合类型、array或者json. from tmp s where s.gid is not null ) ) select count(*) from tmp where gid is not null; -- 以上子查询形式 会多返回一次null, 所以要过滤一下
复制
解决方法3:
使用select内的sub query, order by放在sub query内. 这个方法需要注意的点: select子查询 只能返回单列值, 同时可能返回null, 需要过滤一下.
with recursive tmp as ( ( select gid from (select gid from t_log order by gid limit 1) t ) union all ( select (select t.gid from t_log t where t.gid > s.gid order by t.gid limit 1) as gid -- 这个形式的子查询 只能返回单列值, 如果有多列需要拼装成复合类型、array或者json. from tmp s where s.gid is not null ) ) select count(*) from tmp where gid is not null; -- 以上子查询形式 会多返回一次null, 所以要过滤一下
复制
PostgreSQL 许愿链接
您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。开不开森.