PostgreSQL IN里面传入1w+值如何优化到极致？

励志成为PostgreSQL大神 2021-03-25

1953

前言

今天在 stackexchange上看到了一个名为 Large IN的优化文章，这让我觉得很有意义。以前我们这里开发人员因为 in的值太多，导致 sql语句效率不高，一直没有修改。也是时候改了。

Large In的优化方法

什么叫Large In，意思是语句都中使用 in，然后列表中有大量的值。以前我们在 Oracle上遇到过这样的语句，最多有超过60000个值，直接运行语句中命中 Bug，因为 in中的 list使用了绑定变量，Oracle最多有65536个绑定变量。

现在我们来试一试 PostgreSQL，它是一个292 GB的表，以前的测试tps环境构建的。

pgbench=# select pg_size_pretty(pg_total_relation_size('pgbench_accounts'));
 pg_size_pretty 
----------------
 292 GB
(1 row)

pgbench=# select pg_size_pretty(pg_total_relation_size('pgbench_branches'));
 pg_size_pretty 
----------------
 6520 kB
(1 row)

pgbench=# \d pgbench_accounts
              Table "public.pgbench_accounts"
  Column  |     Type      | Collation | Nullable | Default 
----------+---------------+-----------+----------+---------
 aid      | bigint        |           | not null | 
 bid      | integer       |           |          | 
 abalance | integer       |           |          | 
 filler   | character(84) |           |          | 
Indexes:
    "pgbench_accounts_pkey" PRIMARY KEY, btree (aid)

pgbench=# \d pgbench_branches
              Table "public.pgbench_branches"
  Column  |     Type      | Collation | Nullable | Default 
----------+---------------+-----------+----------+---------
 bid      | integer       |           | not null | 
 bbalance | integer       |           |          | 
 filler   | character(88) |           |          | 
Indexes:
    "pgbench_branches_pkey" PRIMARY KEY, btree (bid)

复制

首先测试 in中有20个值的情况，通常我们语句的写法如下：

explain (analyze,buffers) select * from pgbench_accounts,pgbench_branches where
pgbench_accounts.bid=pgbench_branches.bid and
aid in 
(
10000, 11000, 12000, 13000, 14000, 15000, 16000, 17000, 18000, 19000, 20000, 21000, 22000, 23000, 24000, 25000, 26000, 27000, 28000, 29000
);

复制

经过多次执行，执行时间为0.203ms，速度很快。

对于in，我们默认会使用in-list来查找，如果你有大量数据，我们可以使用常量子查询或临时表。

首先尝试一下常量子查询。

explain (analyze,buffers) 
SELECT * 
   FROM pgbench_accounts,pgbench_branches  where
pgbench_accounts.bid=pgbench_branches.bid and
 aid in 
(
VALUES (10000), (11000), (12000), (13000), (14000), (15000), (16000), (17000), (18000), (19000), (20000), (21000), (22000), (23000), (24000), (25000), (26000), (27000), (28000), (29000)
);

复制

可以看到常量子查询执行效率并不占优势，我们再测试一下临时表。

explain analyze 
with tmp_pgbench_accounts as (
select unnest('{
10000, 11000, 12000, 13000, 14000, 15000, 16000, 17000, 18000, 19000, 20000, 21000, 22000, 23000, 24000, 25000, 26000, 27000, 28000, 29000
}'::bigint[]) "aid")
select * 
from  
pgbench_accounts,pgbench_branches,tmp_pgbench_accounts
where 
pgbench_accounts.bid=pgbench_branches.bid and
pgbench_accounts.aid=tmp_pgbench_accounts.aid

复制