SQL中的LATERAL选项是PostgreSQL9.3版本支持的一个SQL标准,丰富了SQL的写法,简化了一些场景下SQL的代码量及提高查询效率。LATERAL连接是PostgreSQL和其他关系数据库(如Oracle、DB2和MS SQL)鲜为人知的特性之一。横向连接是一个非常有用的功能,下面会举一个例子看看究竟能解决什么问题。
语法
从下面可以看到SELECT已支持LATERAL选项,LATERAL不仅仅支持子查询的交叉引用,还支持函数的引用,本文只介绍子查询部分,函数的使用可参考官方文档。
postgres=# SELECT version();
version
---------------------------------------------------------------------------------------------------------
PostgreSQL 13.3 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit
(1 row)
postgres=# \h SELECT
Command: SELECT
Description: retrieve rows from a table or view
Syntax:
[ WITH [ RECURSIVE ] with_query [, ...] ]
SELECT [ ALL | DISTINCT [ ON ( expression [, ...] ) ] ]
[ * | expression [ [ AS ] output_name ] [, ...] ]
[ FROM from_item [, ...] ]
[ WHERE condition ]
[ GROUP BY grouping_element [, ...] ]
[ HAVING condition ]
[ WINDOW window_name AS ( window_definition ) [, ...] ]
[ { UNION | INTERSECT | EXCEPT } [ ALL | DISTINCT ] select ]
[ ORDER BY expression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ]
[ LIMIT { count | ALL } ]
[ OFFSET start [ ROW | ROWS ] ]
[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } { ONLY | WITH TIES } ]
[ FOR { UPDATE | NO KEY UPDATE | SHARE | KEY SHARE } [ OF table_name [, ...] ] [ NOWAIT | SKIP LOCKED ] [...] ]
where from_item can be one of:
[ ONLY ] table_name [ * ] [ [ AS ] alias [ ( column_alias [, ...] ) ] ]
[ TABLESAMPLE sampling_method ( argument [, ...] ) [ REPEATABLE ( seed ) ] ]
[ LATERAL ] ( select ) [ AS ] alias [ ( column_alias [, ...] ) ]
with_query_name [ [ AS ] alias [ ( column_alias [, ...] ) ] ]
[ LATERAL ] function_name ( [ argument [, ...] ] )
[ WITH ORDINALITY ] [ [ AS ] alias [ ( column_alias [, ...] ) ] ]
[ LATERAL ] function_name ( [ argument [, ...] ] ) [ AS ] alias ( column_definition [, ...] )
[ LATERAL ] function_name ( [ argument [, ...] ] ) AS ( column_definition [, ...] )
[ LATERAL ] ROWS FROM( function_name ( [ argument [, ...] ] ) [ AS ( column_definition [, ...] ) ] [, ...] )
[ WITH ORDINALITY ] [ [ AS ] alias [ ( column_alias [, ...] ) ] ]
from_item [ NATURAL ] join_type from_item [ ON join_condition | USING ( join_column [, ...] ) ]
and grouping_element can be one of:
( )
expression
( expression [, ...] )
ROLLUP ( { expression | ( expression [, ...] ) } [, ...] )
CUBE ( { expression | ( expression [, ...] ) } [, ...] )
GROUPING SETS ( grouping_element [, ...] )
and with_query is:
with_query_name [ ( column_name [, ...] ) ] AS [ [ NOT ] MATERIALIZED ] ( select | values | insert | update | delete )
TABLE [ ONLY ] table_name [ * ]
复制
LATERAL JOIN就是在SQL中进行一个foreach循环,如下:
SELECT <columns>
FROM <table reference>,
LATERAL <inner subquery>;
复制
这里的表引用可以是表、也可以是子查询。该示例中,会迭代<table reference>中的每一行对<inner subquery>的行进行评估。就像一个foreach循环。inner subquery返回的行会被加到lateral join的结果集中。使用lateral join的主要原因是inner subquery可以引用table reference中的行的列,并根据table reference中的列来决定inner subquery的返回值。为了帮助描述,以下是python代码模拟的lateral join:
result = []
for row1 in table_reference():
for row2 in inner_subquery(row1):
result += (row1, row2)
复制
应该可以看到,在内部for循环中迭代的行是在外部for循环中迭代的当前行的函数。
例子
让我们想象一个简单的例子:我们有一系列产品,而且我们还有客户的期望清单,现在的目标是为每个期望清单找到最好的3个产品,以下 SQL 片段创建了一些示例数据:
postgres=# CREATE TABLE t_product AS
postgres-# SELECT id AS product_id,
postgres-# id * 10 * random() AS price,
postgres-# 'product ' || id AS product
postgres-# FROM generate_series(1, 1000) AS id;
SELECT 1000
postgres=# CREATE TABLE t_wishlist
postgres-# (
postgres(# wishlist_id int,
postgres(# username text,
postgres(# desired_price numeric
postgres(# );
CREATE TABLE
postgres=# INSERT INTO t_wishlist VALUES
postgres-# (1, 'hans', '450'),
postgres-# (2, 'joe', '60'),
postgres-# (3, 'jane', '1500');
INSERT 0 3
postgres=# TABLE t_wishlist;
wishlist_id | username | desired_price
-------------+----------+---------------
1 | hans | 450
2 | joe | 60
3 | jane | 1500
(3 rows)
postgres=# TABLE t_product LIMIT 10;
product_id | price | product
------------+--------------------+------------
1 | 1.65659536374676 | product 1
2 | 12.444441358552112 | product 2
3 | 24.94867036294174 | product 3
4 | 11.308674375101901 | product 4
5 | 46.043077132888044 | product 5
6 | 58.55024222104113 | product 6
7 | 0.7335950353502341 | product 7
8 | 57.536423461042716 | product 8
9 | 20.26876394540494 | product 9
10 | 33.04382211753243 | product 10
(10 rows)
复制
在提供一些示例数据并将其加载到我们的PostgreSQL数据库后,我们可以解决问题并尝试提出解决方案。
假设我们想用伪代码找到每个期望的前三个产品:
for x in wishlist
loop
for y in products order by price desc
loop
found++
if found <= 3
then
return row
else
jump to next wish
end
end loop
end loop
复制
重要的是我们需要两个循环,首先我们需要遍历期望清单,然后查看已排序的产品列表,选择3条并转到下一个期望清单。首先我们来看下不用LATERAL怎么实现这个需求,我们可以使用分析函数来实现:
postgres=# SELECT *
postgres-# FROM (SELECT *,
postgres(# row_number() over(PARTITION BY w.wishlist_id ORDER BY p.price DESC) top
postgres(# FROM t_wishlist w,
postgres(# t_product p
postgres(# WHERE w.desired_price > p.price) t
postgres-# WHERE t.top < 4;
wishlist_id | username | desired_price | product_id | price | product | top
-------------+----------+---------------+------------+--------------------+-------------+-----
1 | hans | 450 | 817 | 449.98034915946505 | product 817 | 1
1 | hans | 450 | 603 | 444.2264364298418 | product 603 | 2
1 | hans | 450 | 63 | 443.03316578167573 | product 63 | 3
2 | joe | 60 | 6 | 58.55024222104113 | product 6 | 1
2 | joe | 60 | 12 | 58.42654077971005 | product 12 | 2
2 | joe | 60 | 8 | 57.536423461042716 | product 8 | 3
3 | jane | 1500 | 182 | 1482.6542393751656 | product 182 | 1
3 | jane | 1500 | 493 | 1468.8853400854844 | product 493 | 2
3 | jane | 1500 | 231 | 1452.339885806623 | product 231 | 3
(9 rows)
复制
接下来我们使用LATERAL JOIN实现:
postgres=# SELECT *
postgres-# FROM t_wishlist AS w,
postgres-# lateral (SELECT *
postgres(# FROM t_product AS p
postgres(# WHERE p.price < w.desired_price
postgres(# ORDER BY p.price DESC LIMIT 3) AS x
postgres-# ORDER BY wishlist_id,price DESC;
wishlist_id | username | desired_price | product_id | price | product
-------------+----------+---------------+------------+--------------------+-------------
1 | hans | 450 | 817 | 449.98034915946505 | product 817
1 | hans | 450 | 603 | 444.2264364298418 | product 603
1 | hans | 450 | 63 | 443.03316578167573 | product 63
2 | joe | 60 | 6 | 58.55024222104113 | product 6
2 | joe | 60 | 12 | 58.42654077971005 | product 12
2 | joe | 60 | 8 | 57.536423461042716 | product 8
3 | jane | 1500 | 182 | 1482.6542393751656 | product 182
3 | jane | 1500 | 493 | 1468.8853400854844 | product 493
3 | jane | 1500 | 231 | 1452.339885806623 | product 231
(9 rows)
复制
从上面可以看到两个SQL语句结果一致,代码量相当,但是使用LATERAL的逻辑比分析函数容易理解。
下面查看两种的执行计划:
postgres=# EXPLAIN ANALYZE
SELECT *
FROM (SELECT *,
row_number() over(PARTITION BY w.wishlist_id ORDER BY p.price DESC) top
FROM t_wishlist w,
t_product p
WHERE w.desired_price > p.price) t
WHERE t.top < 4;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Subquery Scan on t (cost=50949.96..60158.28 rows=94444 width=99) (actual time=1.765..2.179 rows=9 loops=1)
Filter: (t.top < 4)
Rows Removed by Filter: 656
-> WindowAgg (cost=50949.96..56616.62 rows=283333 width=99) (actual time=1.764..2.134 rows=665 loops=1)
-> Sort (cost=50949.96..51658.29 rows=283333 width=91) (actual time=1.757..1.802 rows=665 loops=1)
Sort Key: w.wishlist_id, p.price DESC
Sort Method: quicksort Memory: 76kB
-> Nested Loop (cost=0.00..14913.62 rows=283333 width=91) (actual time=0.026..1.423 rows=665 loops=1)
Join Filter: ((w.desired_price)::double precision > p.price)
Rows Removed by Join Filter: 2335
-> Seq Scan on t_product p (cost=0.00..18.00 rows=1000 width=23) (actual time=0.012..0.118 rows=1000 loops=1)
-> Materialize (cost=0.00..22.75 rows=850 width=68) (actual time=0.000..0.000 rows=3 loops=1000)
-> Seq Scan on t_wishlist w (cost=0.00..18.50 rows=850 width=68) (actual time=0.003..0.004 rows=3 loops=1)
Planning Time: 0.132 ms
Execution Time: 2.217 ms
(15 rows)
postgres=# EXPLAIN ANALYZE
postgres-# SELECT *
postgres-# FROM t_wishlist AS w,
postgres-# lateral (SELECT *
postgres(# FROM t_product AS p
postgres(# WHERE p.price < w.desired_price
postgres(# ORDER BY p.price DESC LIMIT 3) AS x
postgres-# ORDER BY wishlist_id,price DESC;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Sort (cost=23428.53..23434.90 rows=2550 width=91) (actual time=0.853..0.855 rows=9 loops=1)
Sort Key: w.wishlist_id, p.price DESC
Sort Method: quicksort Memory: 25kB
-> Nested Loop (cost=27.30..23284.24 rows=2550 width=91) (actual time=0.330..0.845 rows=9 loops=1)
-> Seq Scan on t_wishlist w (cost=0.00..18.50 rows=850 width=68) (actual time=0.010..0.010 rows=3 loops=1)
-> Limit (cost=27.30..27.31 rows=3 width=23) (actual time=0.276..0.276 rows=3 loops=3)
-> Sort (cost=27.30..28.14 rows=333 width=23) (actual time=0.275..0.275 rows=3 loops=3)
Sort Key: p.price DESC
Sort Method: top-N heapsort Memory: 25kB
-> Seq Scan on t_product p (cost=0.00..23.00 rows=333 width=23) (actual time=0.006..0.250 rows=222 loops=3)
Filter: (price < (w.desired_price)::double precision)
Rows Removed by Filter: 778
Planning Time: 0.104 ms
Execution Time: 0.878 ms
(14 rows)
复制
在都没有使用索引的情况下,评估的成本值LATERAL占优,实际的运行时间LATERAL也要更快。
参考:
https://www.postgresql.org/docs/current/sql-select.html
https://www.cybertec-postgresql.com/en/understanding-lateral-joins-in-postgresql/