暂无图片
暂无图片
1
暂无图片
暂无图片
暂无图片

PostgreSQL的LATERAL JOIN介绍

原创 贺晓群 2021-07-13
6845

SQL中的LATERAL选项是PostgreSQL9.3版本支持的一个SQL标准,丰富了SQL的写法,简化了一些场景下SQL的代码量及提高查询效率。LATERAL连接是PostgreSQL和其他关系数据库(如Oracle、DB2和MS SQL)鲜为人知的特性之一。横向连接是一个非常有用的功能,下面会举一个例子看看究竟能解决什么问题。

语法

从下面可以看到SELECT已支持LATERAL选项,LATERAL不仅仅支持子查询的交叉引用,还支持函数的引用,本文只介绍子查询部分,函数的使用可参考官方文档。

postgres=# SELECT version(); version --------------------------------------------------------------------------------------------------------- PostgreSQL 13.3 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit (1 row) postgres=# \h SELECT Command: SELECT Description: retrieve rows from a table or view Syntax: [ WITH [ RECURSIVE ] with_query [, ...] ] SELECT [ ALL | DISTINCT [ ON ( expression [, ...] ) ] ] [ * | expression [ [ AS ] output_name ] [, ...] ] [ FROM from_item [, ...] ] [ WHERE condition ] [ GROUP BY grouping_element [, ...] ] [ HAVING condition ] [ WINDOW window_name AS ( window_definition ) [, ...] ] [ { UNION | INTERSECT | EXCEPT } [ ALL | DISTINCT ] select ] [ ORDER BY expression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ] [ LIMIT { count | ALL } ] [ OFFSET start [ ROW | ROWS ] ] [ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } { ONLY | WITH TIES } ] [ FOR { UPDATE | NO KEY UPDATE | SHARE | KEY SHARE } [ OF table_name [, ...] ] [ NOWAIT | SKIP LOCKED ] [...] ] where from_item can be one of: [ ONLY ] table_name [ * ] [ [ AS ] alias [ ( column_alias [, ...] ) ] ] [ TABLESAMPLE sampling_method ( argument [, ...] ) [ REPEATABLE ( seed ) ] ] [ LATERAL ] ( select ) [ AS ] alias [ ( column_alias [, ...] ) ] with_query_name [ [ AS ] alias [ ( column_alias [, ...] ) ] ] [ LATERAL ] function_name ( [ argument [, ...] ] ) [ WITH ORDINALITY ] [ [ AS ] alias [ ( column_alias [, ...] ) ] ] [ LATERAL ] function_name ( [ argument [, ...] ] ) [ AS ] alias ( column_definition [, ...] ) [ LATERAL ] function_name ( [ argument [, ...] ] ) AS ( column_definition [, ...] ) [ LATERAL ] ROWS FROM( function_name ( [ argument [, ...] ] ) [ AS ( column_definition [, ...] ) ] [, ...] ) [ WITH ORDINALITY ] [ [ AS ] alias [ ( column_alias [, ...] ) ] ] from_item [ NATURAL ] join_type from_item [ ON join_condition | USING ( join_column [, ...] ) ] and grouping_element can be one of: ( ) expression ( expression [, ...] ) ROLLUP ( { expression | ( expression [, ...] ) } [, ...] ) CUBE ( { expression | ( expression [, ...] ) } [, ...] ) GROUPING SETS ( grouping_element [, ...] ) and with_query is: with_query_name [ ( column_name [, ...] ) ] AS [ [ NOT ] MATERIALIZED ] ( select | values | insert | update | delete ) TABLE [ ONLY ] table_name [ * ]
复制

LATERAL JOIN就是在SQL中进行一个foreach循环,如下:

SELECT <columns> FROM <table reference>, LATERAL <inner subquery>;
复制

这里的表引用可以是表、也可以是子查询。该示例中,会迭代<table reference>中的每一行对<inner subquery>的行进行评估。就像一个foreach循环。inner subquery返回的行会被加到lateral join的结果集中。使用lateral join的主要原因是inner subquery可以引用table reference中的行的列,并根据table reference中的列来决定inner subquery的返回值。为了帮助描述,以下是python代码模拟的lateral join:

result = [] for row1 in table_reference(): for row2 in inner_subquery(row1): result += (row1, row2)
复制

应该可以看到,在内部for循环中迭代的行是在外部for循环中迭代的当前行的函数。

例子

让我们想象一个简单的例子:我们有一系列产品,而且我们还有客户的期望清单,现在的目标是为每个期望清单找到最好的3个产品,以下 SQL 片段创建了一些示例数据:

postgres=# CREATE TABLE t_product AS postgres-# SELECT id AS product_id, postgres-# id * 10 * random() AS price, postgres-# 'product ' || id AS product postgres-# FROM generate_series(1, 1000) AS id; SELECT 1000 postgres=# CREATE TABLE t_wishlist postgres-# ( postgres(# wishlist_id int, postgres(# username text, postgres(# desired_price numeric postgres(# ); CREATE TABLE postgres=# INSERT INTO t_wishlist VALUES postgres-# (1, 'hans', '450'), postgres-# (2, 'joe', '60'), postgres-# (3, 'jane', '1500'); INSERT 0 3 postgres=# TABLE t_wishlist; wishlist_id | username | desired_price -------------+----------+--------------- 1 | hans | 450 2 | joe | 60 3 | jane | 1500 (3 rows) postgres=# TABLE t_product LIMIT 10; product_id | price | product ------------+--------------------+------------ 1 | 1.65659536374676 | product 1 2 | 12.444441358552112 | product 2 3 | 24.94867036294174 | product 3 4 | 11.308674375101901 | product 4 5 | 46.043077132888044 | product 5 6 | 58.55024222104113 | product 6 7 | 0.7335950353502341 | product 7 8 | 57.536423461042716 | product 8 9 | 20.26876394540494 | product 9 10 | 33.04382211753243 | product 10 (10 rows)
复制

在提供一些示例数据并将其加载到我们的PostgreSQL数据库后,我们可以解决问题并尝试提出解决方案。
假设我们想用伪代码找到每个期望的前三个产品:

for x in wishlist loop for y in products order by price desc loop found++ if found <= 3 then return row else jump to next wish end end loop end loop
复制

重要的是我们需要两个循环,首先我们需要遍历期望清单,然后查看已排序的产品列表,选择3条并转到下一个期望清单。首先我们来看下不用LATERAL怎么实现这个需求,我们可以使用分析函数来实现:

postgres=# SELECT * postgres-# FROM (SELECT *, postgres(# row_number() over(PARTITION BY w.wishlist_id ORDER BY p.price DESC) top postgres(# FROM t_wishlist w, postgres(# t_product p postgres(# WHERE w.desired_price > p.price) t postgres-# WHERE t.top < 4; wishlist_id | username | desired_price | product_id | price | product | top -------------+----------+---------------+------------+--------------------+-------------+----- 1 | hans | 450 | 817 | 449.98034915946505 | product 817 | 1 1 | hans | 450 | 603 | 444.2264364298418 | product 603 | 2 1 | hans | 450 | 63 | 443.03316578167573 | product 63 | 3 2 | joe | 60 | 6 | 58.55024222104113 | product 6 | 1 2 | joe | 60 | 12 | 58.42654077971005 | product 12 | 2 2 | joe | 60 | 8 | 57.536423461042716 | product 8 | 3 3 | jane | 1500 | 182 | 1482.6542393751656 | product 182 | 1 3 | jane | 1500 | 493 | 1468.8853400854844 | product 493 | 2 3 | jane | 1500 | 231 | 1452.339885806623 | product 231 | 3 (9 rows)
复制

接下来我们使用LATERAL JOIN实现:

postgres=# SELECT * postgres-# FROM t_wishlist AS w, postgres-# lateral (SELECT * postgres(# FROM t_product AS p postgres(# WHERE p.price < w.desired_price postgres(# ORDER BY p.price DESC LIMIT 3) AS x postgres-# ORDER BY wishlist_id,price DESC; wishlist_id | username | desired_price | product_id | price | product -------------+----------+---------------+------------+--------------------+------------- 1 | hans | 450 | 817 | 449.98034915946505 | product 817 1 | hans | 450 | 603 | 444.2264364298418 | product 603 1 | hans | 450 | 63 | 443.03316578167573 | product 63 2 | joe | 60 | 6 | 58.55024222104113 | product 6 2 | joe | 60 | 12 | 58.42654077971005 | product 12 2 | joe | 60 | 8 | 57.536423461042716 | product 8 3 | jane | 1500 | 182 | 1482.6542393751656 | product 182 3 | jane | 1500 | 493 | 1468.8853400854844 | product 493 3 | jane | 1500 | 231 | 1452.339885806623 | product 231 (9 rows)
复制

从上面可以看到两个SQL语句结果一致,代码量相当,但是使用LATERAL的逻辑比分析函数容易理解。

下面查看两种的执行计划:

postgres=# EXPLAIN ANALYZE SELECT * FROM (SELECT *, row_number() over(PARTITION BY w.wishlist_id ORDER BY p.price DESC) top FROM t_wishlist w, t_product p WHERE w.desired_price > p.price) t WHERE t.top < 4; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------- Subquery Scan on t (cost=50949.96..60158.28 rows=94444 width=99) (actual time=1.765..2.179 rows=9 loops=1) Filter: (t.top < 4) Rows Removed by Filter: 656 -> WindowAgg (cost=50949.96..56616.62 rows=283333 width=99) (actual time=1.764..2.134 rows=665 loops=1) -> Sort (cost=50949.96..51658.29 rows=283333 width=91) (actual time=1.757..1.802 rows=665 loops=1) Sort Key: w.wishlist_id, p.price DESC Sort Method: quicksort Memory: 76kB -> Nested Loop (cost=0.00..14913.62 rows=283333 width=91) (actual time=0.026..1.423 rows=665 loops=1) Join Filter: ((w.desired_price)::double precision > p.price) Rows Removed by Join Filter: 2335 -> Seq Scan on t_product p (cost=0.00..18.00 rows=1000 width=23) (actual time=0.012..0.118 rows=1000 loops=1) -> Materialize (cost=0.00..22.75 rows=850 width=68) (actual time=0.000..0.000 rows=3 loops=1000) -> Seq Scan on t_wishlist w (cost=0.00..18.50 rows=850 width=68) (actual time=0.003..0.004 rows=3 loops=1) Planning Time: 0.132 ms Execution Time: 2.217 ms (15 rows) postgres=# EXPLAIN ANALYZE postgres-# SELECT * postgres-# FROM t_wishlist AS w, postgres-# lateral (SELECT * postgres(# FROM t_product AS p postgres(# WHERE p.price < w.desired_price postgres(# ORDER BY p.price DESC LIMIT 3) AS x postgres-# ORDER BY wishlist_id,price DESC; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------- Sort (cost=23428.53..23434.90 rows=2550 width=91) (actual time=0.853..0.855 rows=9 loops=1) Sort Key: w.wishlist_id, p.price DESC Sort Method: quicksort Memory: 25kB -> Nested Loop (cost=27.30..23284.24 rows=2550 width=91) (actual time=0.330..0.845 rows=9 loops=1) -> Seq Scan on t_wishlist w (cost=0.00..18.50 rows=850 width=68) (actual time=0.010..0.010 rows=3 loops=1) -> Limit (cost=27.30..27.31 rows=3 width=23) (actual time=0.276..0.276 rows=3 loops=3) -> Sort (cost=27.30..28.14 rows=333 width=23) (actual time=0.275..0.275 rows=3 loops=3) Sort Key: p.price DESC Sort Method: top-N heapsort Memory: 25kB -> Seq Scan on t_product p (cost=0.00..23.00 rows=333 width=23) (actual time=0.006..0.250 rows=222 loops=3) Filter: (price < (w.desired_price)::double precision) Rows Removed by Filter: 778 Planning Time: 0.104 ms Execution Time: 0.878 ms (14 rows)
复制

在都没有使用索引的情况下,评估的成本值LATERAL占优,实际的运行时间LATERAL也要更快。

参考:
https://www.postgresql.org/docs/current/sql-select.html
https://www.cybertec-postgresql.com/en/understanding-lateral-joins-in-postgresql/

最后修改时间:2021-07-13 18:32:03
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论