作者
digoal
日期
2020-08-03
标签
PostgreSQL , hashagg , hll , hash分组数 , pg_leftmost_one_post32
背景
hll是一种近似类型, 少量空间可以存储大量的唯一值, 类似bloom.
《PostgreSQL 大量IO扫描、计算浪费的优化 - 推荐模块, 过滤已推荐. (热点用户、已推荐列表超大)》
《PostgreSQL 推荐系统优化总计 - 空间、时间、标量等混合多模查询场景, 大量已读过滤导致CPU IO剧增(类挖矿概率下降优化)》
使用hll存储hashagg的分组唯一值个数, 相比记录条数, 可以更准确的反映每个分组的大小, 选择合适的分组数.
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=fd734f387d8780d9989d750942d026167de8cf3c
```
Use pg_bitutils for HyperLogLog.
author Jeff Davis jdavis@postgresql.org
Thu, 30 Jul 2020 23:44:58 +0800 (08:44 -0700)
committer Jeff Davis jdavis@postgresql.org
Fri, 31 Jul 2020 00:14:23 +0800 (09:14 -0700)
commit fd734f387d8780d9989d750942d026167de8cf3c
tree 07ec4db85bd5c2af6fc4252cb9e1a4549fe8c1fc tree | snapshot
parent f1af75c5f2516ec5b20cfe4b3a474071a318ae1e commit | diff
Use pg_bitutils for HyperLogLog.
Using pg_leftmost_one_post32() yields substantial performance benefits.
Backpatching to version 13 because HLL is used for HashAgg
improvements in 9878b643, which was also backpatched to 13.
Reviewed-by: Peter Geoghegan
Discussion: https://postgr.es/m/CAH2-WzkGvDKVDo+0YvfvZ+1CE=iCi88DCOGFF3i1hTGGaxcKPw@mail.gmail.com
Backpatch-through: 13
```
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=9878b643f37b1e4167f64a9941244bfabed60623
```
HashAgg: use better cardinality estimate for recursive spilling.
author Jeff Davis jdavis@postgresql.org
Wed, 29 Jul 2020 14:15:47 +0800 (23:15 -0700)
committer Jeff Davis jdavis@postgresql.org
Wed, 29 Jul 2020 14:16:28 +0800 (23:16 -0700)
commit 9878b643f37b1e4167f64a9941244bfabed60623
tree f211bc0c45f888f8fea8d3827a2e2ae6be4fa0ea tree | snapshot
parent f2130e77da51f35d37fd15a343bc1c4a4527e0fd commit | diff
HashAgg: use better cardinality estimate for recursive spilling.
Use HyperLogLog to estimate the group cardinality in a spilled
partition. This estimate is used to choose the number of partitions if
we recurse.
The previous behavior was to use the number of tuples in a spilled
partition as the estimate for the number of groups, which lead to
overpartitioning. That could cause the number of batches to be much
higher than expected (with each batch being very small), which made it
harder to interpret EXPLAIN ANALYZE results.
Reviewed-by: Peter Geoghegan
Discussion: https://postgr.es/m/a856635f9284bc36f7a77d02f47bbb6aaf7b59b3.camel@j-davis.com
Backpatch-through: 13
```
PostgreSQL 许愿链接
您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。开不开森.