PostgreSQL pageinspect 诊断与优化GIN (倒排) 索引合并延迟导致的查询性能下降问题

digoal 2018-09-19

435

作者

digoal

日期

2018-09-19

背景

GIN索引为PostgreSQL数据库多值类型的倒排索引，一条记录可能涉及到多个GIN索引中的KEY，所以如果写入时实时合并索引，会导致IO急剧增加，写入RT必然增加。为了提高写入吞吐，PG允许用户开启GIN索引的延迟合并技术，开启后，数据会先写入pending list，并不是直接写入索引页，当pending list达到一定大小，或者autovacuum 对应表时，会触发pending list合并到索引的动作。

查询时，如果有未合并到索引中的PENDING LIST，那么会查询pending list，同时查询索引也的信息。

如果写入量很多，pending list非常巨大，合并(autovacuum worker做的)速度跟不上时，会导致通过GIN索引查询时查询性能下降。

知道问题的根源，就知道如何解决，以及如何排查。

背景原理

https://www.postgresql.org/docs/11/static/sql-createindex.html

GIN indexes accept different parameters:

1、fastupdate

This setting controls usage of the fast update technique described in Section 66.4.1. It is a Boolean parameter: ON enables fast update, OFF disables it. (Alternative spellings of ON and OFF are allowed as described in Section 19.1.) The default is ON.

Note

Turning fastupdate off via ALTER INDEX prevents future insertions from going into the list of pending index entries, but does not in itself flush previous entries. You might want to VACUUM the table or call gin_clean_pending_list function afterward to ensure the pending list is emptied.

2、gin_pending_list_limit

Custom gin_pending_list_limit parameter. This value is specified in kilobytes.

当前设置

```
postgres=# show gin_pending_list_limit ;
gin_pending_list_limit

4MB
(1 row)
```

BRIN indexes accept different parameters:

1、pages_per_range

Defines the number of table blocks that make up one block range for each entry of a BRIN index (see Section 67.1 for more details). The default is 128.

2、autosummarize

Defines whether a summarization run is invoked for the previous page range whenever an insertion is detected on the next one.

通过pageinspect可观察索引的pending list等内容。

https://www.postgresql.org/docs/11/static/pageinspect.html

postgres=# create extension pageinspect ; CREATE EXTENSION

例子

1、建表

postgres=# create table t(id int, arr int[]); CREATE TABLE

2、创建倒排索引

postgres=# create index idx_t_1 on t using gin (arr); CREATE INDEX

3、创建生成随机数组的函数

postgres=# create or replace function gen_rand_arr() returns int[] as $$ select array(select (100*random())::int from generate_series(1,64)); $$ language sql strict; CREATE FUNCTION

4、写入测试数据

postgres=# insert into t select generate_series(1,100000), gen_rand_arr(); INSERT 0 100000 postgres=# insert into t select generate_series(1,1000000), gen_rand_arr(); INSERT 0 1000000

5、通过pageinspect插件，观察当前GIN索引的pendinglist大小，可以看到pending page有356个，涉及2484条记录。

如果很多条记录在pending list中，查询性能会下降明显。

postgres=# SELECT * FROM gin_metapage_info(get_raw_page('idx_t_1', 0)); pending_head | pending_tail | tail_free_size | n_pending_pages | n_pending_tuples | n_total_pages | n_entry_pages | n_data_pages | n_entries | version --------------+--------------+----------------+-----------------+------------------+---------------+---------------+--------------+-----------+--------- 2 | 369 | 3640 | 356 | 2848 | 2 | 1 | 0 | 0 | 2 (1 row)

6、查询测试1，（pending list大于0）

```
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t where arr @> array[1,2,3];
QUERY PLAN

Bitmap Heap Scan on public.t (cost=82.38..262.28 rows=11373 width=284) (actual time=82.444..141.559 rows=114906 loops=1)
Output: id, arr
Recheck Cond: (t.arr @> '{1,2,3}'::integer[])
Heap Blocks: exact=41304
Buffers: shared hit=42043
-> Bitmap Index Scan on idx_t_1 (cost=0.00..79.92 rows=11373 width=0) (actual time=75.902..75.902 rows=114906 loops=1)
Index Cond: (t.arr @> '{1,2,3}'::integer[])
Buffers: shared hit=739
Planning Time: 0.092 ms
Execution Time: 152.260 ms
(10 rows)
```

7、vacuum table，强制合并pending list

```
set vacuum_cost_delay=0;

postgres=# vacuum t;
VACUUM
```

8、观察pendign list合并后，n_pending_tuples等于0.

postgres=# SELECT * FROM gin_metapage_info(get_raw_page('idx_t_1', 0)); pending_head | pending_tail | tail_free_size | n_pending_pages | n_pending_tuples | n_total_pages | n_entry_pages | n_data_pages | n_entries | version --------------+--------------+----------------+-----------------+------------------+---------------+---------------+--------------+-----------+--------- 4294967295 | 4294967295 | 0 | 0 | 0 | 9978 | 41 | 9421 | 101 | 2 (1 row)

9、查询测试2，（pending list = 0）

```
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t where arr @> array[1,2,3];
QUERY PLAN

Bitmap Heap Scan on public.t (cost=792.36..1699.10 rows=117244 width=284) (actual time=79.861..139.603 rows=114906 loops=1)
Output: id, arr
Recheck Cond: (t.arr @> '{1,2,3}'::integer[])
Heap Blocks: exact=41304
Buffers: shared hit=41687
-> Bitmap Index Scan on idx_t_1 (cost=0.00..766.95 rows=117244 width=0) (actual time=73.360..73.360 rows=114906 loops=1)
Index Cond: (t.arr @> '{1,2,3}'::integer[])
Buffers: shared hit=383 -- 大幅减少
Planning Time: 0.135 ms
Execution Time: 150.656 ms
(10 rows)
```

与此类似，brin也有类似的情况。

处理方法类似。

真实用户场景case

用户用PG作为业务的搜索引擎，任意字段组合查询，模糊查询，全文检索等。数据有大量的更新，更新按KEY值进行，在更新前有一次查询（也是KEY值查询）。但是用户图省事，所有字段都放在了gin索引上，包括key字段。

用户这么做会发生什么呢？1、查询时走的是gin索引，2、大量更新操作会导致gin pending list增大。3、查询如果有大量的pending list，会导致cpu消耗在list扫描上，总耗时增加。4、gin使用的是bitmapscan, 有cpu recheck的消耗，总体比index scan更耗费cpu。最后在高并发情况下cpu很快就打爆。

解决办法：把key字段从gin索引里面拿出来放到btree索引里面，数据库会优先选择btree，查询更快，cpu消耗更低。如果key还有其他任意组合查询，还是可以继续保留在gin里面，也就是说key有btree同时有gin.

复现

1、安装插件

create extension btree_gin; create extension pg_trgm; create extension pageinspect;

2、创建有问题的表和索引，所有字段都放在一个gin索引里面，1000万记录。

```
create table test(id int, c1 int, c2 int, c3 int, c4 int, c5 text);

insert into test select generate_series(1,10000000), random()100, random()1000, random()10000, random()100000, md5(random()::text);

create index idx_test_1 on test using gin (id,c1,c2,c3,c4,c5 gin_trgm_ops);
```

3、执行计划

explain (analyze,verbose,timing,costs,buffers) select * from test where id=1; explain (analyze,verbose,timing,costs,buffers) update test set c1=random()*100, c2=random()*1000, c3=random()*10000, c4=random()*100000 where id=1;

```
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from test where id=1;
QUERY PLAN

Bitmap Heap Scan on public.test (cost=0.10..0.12 rows=1 width=53) (actual time=0.143..0.143 rows=1 loops=1)
Output: id, c1, c2, c3, c4, c5
Recheck Cond: (test.id = 1)
Heap Blocks: exact=1
Buffers: shared hit=5
-> Bitmap Index Scan on idx_test_1 (cost=0.00..0.10 rows=1 width=0) (actual time=0.098..0.098 rows=1 loops=1)
Index Cond: (test.id = 1)
Buffers: shared hit=4
Planning Time: 2.089 ms
Execution Time: 0.922 ms
(10 rows)

postgres=# explain (analyze,verbose,timing,costs,buffers) update test set c1=random()100, c2=random()1000, c3=random()10000, c4=random()100000 where id=1;
QUERY PLAN

Update on public.test (cost=0.10..0.15 rows=1 width=59) (actual time=1.342..1.343 rows=0 loops=1)
Buffers: shared hit=9 read=1 dirtied=1
-> Bitmap Heap Scan on public.test (cost=0.10..0.15 rows=1 width=59) (actual time=0.053..0.054 rows=1 loops=1)
Output: id, (random() * '100'::double precision), (random() * '1000'::double precision), (random() * '10000'::double precision), (random() * '100000'::double precision), c5, ctid
Recheck Cond: (test.id = 1)
Heap Blocks: exact=1
Buffers: shared hit=5
-> Bitmap Index Scan on idx_test_1 (cost=0.00..0.10 rows=1 width=0) (actual time=0.021..0.021 rows=1 loops=1)
Index Cond: (test.id = 1)
Buffers: shared hit=4
Planning Time: 0.149 ms
Execution Time: 1.496 ms
(12 rows)
```

4、压测，更新操作，id在gin里面，4960 qps

vi test.sql \set id random(1,10000000) update test set c1=random()*100, c2=random()*1000, c3=random()*10000, c4=random()*100000 where id=:id;

```
pgbench -M prepared -n -r -P 1 -f ./test.sql -c 64 -j 64 -T 120

transaction type: ./test.sql
scaling factor: 1
query mode: prepared
number of clients: 64
number of threads: 64
duration: 120 s
number of transactions actually processed: 595656
latency average = 12.887 ms
latency stddev = 27.841 ms
tps = 4960.755350 (including connections establishing)
tps = 4963.713963 (excluding connections establishing)
statement latencies in milliseconds:
0.002 \set id random(1,10000000)
12.924 update test set c1=random()100, c2=random()1000, c3=random()10000, c4=random()100000 where id=:id;
```

压测过程中，可以看到gin索引的pending page很多.

```
SELECT * FROM gin_metapage_info(get_raw_page('idx_test_1', 0));

优化方法

把key拿出来，模糊查询字段依旧保留在gin里面，c1,c2,c3,c4如果有必要，还是可以在gin里面

```
create table test1(id int, c1 int, c2 int, c3 int, c4 int, c5 text);
insert into test1 select * from test;
create index idx_test1_1 on test1 using btree(id);
create index idx_test1_2 on test1 using btree(c1);
create index idx_test1_3 on test1 using btree(c2);
create index idx_test1_4 on test1 using btree(c3);
create index idx_test1_5 on test1 using btree(c4);
create index idx_test1_6 on test1 using gin(c5 gin_trgm_ops);

或
create index idx_test1_1 on test1 using btree(id);
create index idx_test1_2 on test1 using gin(id,c1,c2,c3,c4,c5 gin_trgm_ops);
```

执行计划

```
explain (analyze,verbose,timing,costs,buffers) select * from test1 where id=1;
explain (analyze,verbose,timing,costs,buffers) update test1 set c1=random()100, c2=random()1000, c3=random()10000, c4=random()100000 where id=1;

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from test1 where id=1;
QUERY PLAN

Index Scan using idx_test1_1 on public.test1 (cost=0.38..0.42 rows=1 width=53) (actual time=0.089..0.090 rows=1 loops=1)
Output: id, c1, c2, c3, c4, c5
Index Cond: (test1.id = 1)
Buffers: shared hit=1 read=3
Planning Time: 0.696 ms
Execution Time: 0.115 ms
(6 rows)

postgres=#
postgres=# explain (analyze,verbose,timing,costs,buffers) update test1 set c1=random()100, c2=random()1000, c3=random()10000, c4=random()100000 where id=1;
QUERY PLAN

Update on public.test1 (cost=0.38..0.44 rows=1 width=59) (actual time=1.183..1.183 rows=0 loops=1)
Buffers: shared hit=11 read=13 dirtied=6
-> Index Scan using idx_test1_1 on public.test1 (cost=0.38..0.44 rows=1 width=59) (actual time=0.019..0.021 rows=1 loops=1)
Output: id, (random() * '100'::double precision), (random() * '1000'::double precision), (random() * '10000'::double precision), (random() * '100000'::double precision), c5, ctid
Index Cond: (test1.id = 1)
Buffers: shared hit=4
Planning Time: 0.135 ms
Execution Time: 1.246 ms
(8 rows)
```

压测，qps提升到了43231，将近10倍

vi test1.sql \set id random(1,10000000) update test1 set c1=random()*100, c2=random()*1000, c3=random()*10000, c4=random()*100000 where id=:id;

```
pgbench -M prepared -n -r -P 1 -f ./test1.sql -c 64 -j 64 -T 120

transaction type: ./test1.sql
scaling factor: 1
query mode: prepared
number of clients: 64
number of threads: 64
duration: 120 s
number of transactions actually processed: 5187765
latency average = 1.480 ms
latency stddev = 30.987 ms
tps = 43212.076731 (including connections establishing)
tps = 43231.697380 (excluding connections establishing)
statement latencies in milliseconds:
0.001 \set id random(1,10000000)
1.479 update test1 set c1=random()100, c2=random()1000, c3=random()10000, c4=random()100000 where id=:id;
```

压测过程中gin索引依旧会有pending产生，但是不要紧，因为不需要走它来查询，所以update效率不会下降。

性能提升将近10倍

https://www.postgresql.org/docs/12/pgtrgm.html

https://www.postgresql.org/docs/12/btree-gin.html

https://www.postgresql.org/docs/12/pageinspect.html

小结

数据库为了降低索引引入的写RT升高，采用了延迟合并的方法。如果数据库长期写压力巨大，可能导致未合并的LIST很大，导致查询性能受到影响。

使用pageinspect插件可以观察未合并的pending list有多大。

使用vacuum可以强制合并pending list，提高查询性能。

参考

https://www.postgresql.org/docs/11/static/pageinspect.html

https://www.postgresql.org/docs/11/static/sql-createindex.html

文章转载自digoal，如果涉嫌侵权，请发送邮件至：contact@modb.pro进行举报，并提供相关证据，一经查实，墨天轮将立刻删除相关内容。

PostgreSQL pageinspect 诊断与优化GIN (倒排) 索引合并延迟导致的查询性能下降问题

作者

日期

标签

背景

背景原理

GIN indexes accept different parameters:

BRIN indexes accept different parameters:

通过pageinspect可观察索引的pending list等内容。

例子

与此类似，brin也有类似的情况。

真实用户场景case

复现

优化方法

性能提升将近10倍

小结

参考

PostgreSQL 许愿链接

9.9元购买3个月阿里云RDS PostgreSQL实例

PostgreSQL 解决方案集合

德哥 / digoal's github - 公益是一辈子的事.

评论

相关阅读