作者
digoal
日期
2021-07-09
标签
PostgreSQL , hash分片 , memory copy , rb_contains , 优化
背景
视频回放: https://www.bilibili.com/video/BV1xh41167uN/
实时营销, 人群圈选推荐业务:
用户数, 十亿级别
每天刷新用户标签表. 单一标签最大uid个数, 亿级别
每天重算定向推广规则表. 单一推广包最大uid个数, 亿级别
用户标签表结构:
标签, uids rb -- 千行级别, 每个rb里面可能有百万、千万个uid.
定向推广规则表结构:
包名, uids rb -- 万行级别, 每个rb里面可能有百万、千万个uid.
每个UID最多上万个定向规则.
1、运营或广告主根据标签组合生成定向规则(存入定向推广规则表), 无性能瓶颈.
2、推广逻辑:
对于某个推广位, 登陆app时, 根据运营(推广位)规则映射表, 找到这个推广位里面的定向规则有哪些, 同时检查哪些定向规则匹配当前用户, 根据匹配到的定向规则往这个广告位填充对应的营销内容.
select 标签 from tbl where 标签 in () and rb @> uid;
第2步存在性能瓶颈, 因为rb很大(每个RB有几百万甚至上千万UID, RB字段估计几十到几百MB), 目测为判断uid是否在rb内时 memory copy和RB CONTAIN操作符计算的瓶颈.
优化方案
对rb按hash切分, 拆成多条或单条多字段结构.
标签, hash_val_mod, UIDs rb
select 标签 from tbl where 标签 in () and hash_val_mod=? and rb @> uid;
在mac book pro上, 优化后切分为128个分片, 性能提升50倍.
DEMO
强制索引扫描
alter role postgres set enable_seqscan=off;
alter role postgres set enable_bitmapscan=off;
优化前
```
create unlogged table test (id int, uids roaringbitmap);
create index idx_test_1 on test (id);
create or replace function gen_rb(int, int) returns roaringbitmap as $$
select rb_build_agg((random()*$1)::int) from generate_series(1,$2);
$$ language sql strict;
insert into test select generate_series(1,100), gen_rb(1000000000,5000000);
```
每行500万个UID, 约10MB.
postgres=# \dt+
List of relations
Schema | Name | Type | Owner | Persistence | Access method | Size | Description
--------+-------+-------+----------+-------------+---------------+---------+-------------
public | test | table | postgres | unlogged | heap | 999 MB |
public | test1 | table | postgres | unlogged | heap | 2425 MB |
(2 rows)
```
postgres=# explain select * from test where id in (1,2,3,4,5,6) and uids @> 10;
QUERY PLAN
Index Scan using idx_test_1 on test (cost=0.14..4.28 rows=1 width=22)
Index Cond: (id = ANY ('{1,2,3,4,5,6}'::integer[]))
Filter: (uids @> 10)
(3 rows)
```
```
vi test.sql
\set v random(1,100)
\set uid random(1,1000000000)
select * from test where id in (:v, :v+1, :v+2, :v+3, :v+4, :v+5, :v+6, :v+7, :v+8, :v+9, :v+10) and uids @> :uid::int;
pgbench -M prepared -n -r -P 1 -f ./test.sql -c 16 -j 16 -T 120
transaction type: ./test.sql
scaling factor: 1
query mode: prepared
number of clients: 16
number of threads: 16
duration: 120 s
number of transactions actually processed: 8138
latency average = 236.345 ms
latency stddev = 104.607 ms
initial connection time = 19.508 ms
tps = 67.523957 (without initial connection time)
statement latencies in milliseconds:
0.004 \set v random(1,100)
0.001 \set uid random(1,1000000000)
236.403 select * from test where id in (:v, :v+1, :v+2, :v+3, :v+4, :v+5, :v+6, :v+7, :v+8, :v+9, :v+10) and rb_contains(uids, :ui
```
优化后
```
create unlogged table test1 (id int, hid int, uids roaringbitmap);
create index idx_test1_1 on test1(hid,id);
create or replace function gen_rb1(int, int) returns table(id int, rb roaringbitmap) as $$
select abs(mod(hashint4(uid), 128)) as id, rb_build_agg(uid) as rb from
(select (random()*$1)::int uid from generate_series(1,$2)) t
group by 1;
$$ language sql strict;
insert into test1 select id, (gen_rb1(1000000000,5000000)).* from generate_series(1,100) id;
```
HASH分片为128后, 每行约4万个UID, 约1.9MB.
postgres=# \dt+
List of relations
Schema | Name | Type | Owner | Persistence | Access method | Size | Description
--------+-------+-------+----------+-------------+---------------+---------+-------------
public | test | table | postgres | unlogged | heap | 999 MB |
public | test1 | table | postgres | unlogged | heap | 2425 MB |
(2 rows)
```
postgres=# explain select * from test1 where hid = abs(mod(hashint4(10), 128)) and id in (1,2,3,4,5,6) and uids @> 10;
QUERY PLAN
Index Scan using idx_test1_1 on test1 (cost=0.28..39.91 rows=1 width=26)
Index Cond: ((hid = 57) AND (id = ANY ('{1,2,3,4,5,6}'::integer[])))
Filter: (uids @> 10)
(3 rows)
```
```
vi test1.sql
\set v random(1,100)
\set uid random(1,1000000000)
select * from test1 where hid = abs(mod(hashint4(:uid::int), 128)) and id in (:v, :v+1, :v+2, :v+3, :v+4, :v+5, :v+6, :v+7, :v+8, :v+9, :v+10) and uids @> :uid::int;
pgbench -M prepared -n -r -P 1 -f ./test1.sql -c 16 -j 16 -T 120
transaction type: ./test1.sql
scaling factor: 1
query mode: prepared
number of clients: 16
number of threads: 16
duration: 120 s
number of transactions actually processed: 387850
latency average = 4.955 ms
latency stddev = 11.985 ms
initial connection time = 20.415 ms
tps = 3221.912507 (without initial connection time)
statement latencies in milliseconds:
0.001 \set v random(1,100)
0.001 \set uid random(1,1000000000)
4.957 select * from test1 where hid = abs(mod(hashint4(:uid::int), 128)) and id in (:v, :v+1, :v+2, :v+3, :v+4, :v+5, :v+6, :v+7, :
```
结论
在mac book pro上, 优化后切分为128个分片, 性能提升50倍.
思路与之类似:
《重新发现PostgreSQL之美 - 26 这个推荐算法价值1亿》
《PostgreSQL 推荐系统优化总计 - 空间、时间、标量等混合多模查询场景, 大量已读过滤导致CPU IO剧增(类挖矿概率下降优化)》
《PostgreSQL multipolygon 空间索引查询过滤精简优化 - IO,CPU放大优化》
《PostgreSQL 空间切割(st_split, ST_Subdivide)功能扩展 - 空间对象网格化 (多边形GiST优化)》
《PostgreSQL 空间st_contains,st_within空间包含搜索优化 - 降IO和降CPU(bound box) (多边形GiST优化)》
PostgreSQL 许愿链接
您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。开不开森.