暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

PostgreSQL 14 preview - TOAST 支持 lz4 压缩算法 - --with-lz4 , 新增GUC default_toast_compression

digoal 2021-01-03
1535

作者

digoal

日期

2021-03-20

标签

PostgreSQL , default_toast_compression , lz4 , 压缩 , toast


背景

PostgreSQL Heap table 支持列压缩(指某行的可变长度列, 最大1GB), 当存储在某行的可变长度列的空间占用超过TOAST_TUPLE_THRESHOLD时, 可变列的内容将压缩后存储在这个表对应的toast表中.

压缩方法默认为pglz, PostgreSQL 14支持lz4算法, 要支持lz4必须在编译postgresql时选择 --with-lz4.

pglz压缩比lz4省cpu, 而lz4比pglz压缩比高.

create table, alter table可以指定表的压缩算法, 也能设置GUC参数default_toast_compression指定默认压缩算法.

pg_column_compression(column) 可以查询当前行的指定列使用了什么压缩算法.

PSQL \d+ 表名 可以查询表的压缩算法

GUC参数default_toast_compression查询当前默认压缩算法

注意:

为了提高效率, 在使用alter table修改压缩算法后, 已压缩的内容不会改变, 例如 CREATE TABLE AS or INSERT .. SELECT , 原有记录的压缩算法保持不变, 新增的记录才会使用新压缩算法.

而VACUUM FULL or CLUSTER 操作会使用新压缩算法对所有记录重算.

更多toast相关内容参考:

《PostgreSQL 14 preview - vacuum PROCESS_TOAST 开关 (是否vacuum 相关 toast)》

《PostgreSQL 14 preview - psql客户端增强, 支持desc toast表和索引》

《PostgreSQL 11 preview - TOAST切片阈值表级可配置》

《HOW to Change PostgreSQL's TOAST_TUPLE_THRESHOLD》

《how difference when update a table's column which it in TOAST or BASETABLE(in-line)》

《TOAST table with pgfincore》

《TOAST,The Oversized-Attribute Storage Technique - 暨存储格式main, extended, external, plain介绍》

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=bbe0a81db69bd10bd166907c3701492a29aca294

```
Allow configurable LZ4 TOAST compression.
author Robert Haas rhaas@postgresql.org
Fri, 19 Mar 2021 19:10:38 +0000 (15:10 -0400)
committer Robert Haas rhaas@postgresql.org
Fri, 19 Mar 2021 19:10:38 +0000 (15:10 -0400)
commit bbe0a81db69bd10bd166907c3701492a29aca294
tree 13d4a0dd3309af4f7970a1edbb61da0d974a2448 tree
parent e589c4890b05044a04207c2797e7c8af6693ea5f commit | diff
Allow configurable LZ4 TOAST compression.

There is now a per-column COMPRESSION option which can be set to pglz
(the default, and the only option in up until now) or lz4. Or, if you
like, you can set the new default_toast_compression GUC to lz4, and
then that will be the default for new table columns for which no value
is specified. We don't have lz4 support in the PostgreSQL code, so
to use lz4 compression, PostgreSQL must be built --with-lz4.

In general, TOAST compression means compression of individual column
values, not the whole tuple, and those values can either be compressed
inline within the tuple or compressed and then stored externally in
the TOAST table, so those properties also apply to this feature.

Prior to this commit, a TOAST pointer has two unused bits as part of
the va_extsize field, and a compessed datum has two unused bits as
part of the va_rawsize field. These bits are unused because the length
of a varlena is limited to 1GB; we now use them to indicate the
compression type that was used. This means we only have bit space for
2 more built-in compresison types, but we could work around that
problem, if necessary, by introducing a new vartag_external value for
any further types we end up wanting to add. Hopefully, it won't be
too important to offer a wide selection of algorithms here, since
each one we add not only takes more coding but also adds a build
dependency for every packager. Nevertheless, it seems worth doing
at least this much, because LZ4 gets better compression than PGLZ
with less CPU usage.

It's possible for LZ4-compressed datums to leak into composite type
values stored on disk, just as it is for PGLZ. It's also possible for
LZ4-compressed attributes to be copied into a different table via SQL
commands such as CREATE TABLE AS or INSERT .. SELECT. It would be
expensive to force such values to be decompressed, so PostgreSQL has
never done so. For the same reasons, we also don't force recompression
of already-compressed values even if the target table prefers a
different compression method than was used for the source data. These
architectural decisions are perhaps arguable but revisiting them is
well beyond the scope of what seemed possible to do as part of this
project. However, it's relatively cheap to recompress as part of
VACUUM FULL or CLUSTER, so this commit adjusts those commands to do
so, if the configured compression method of the table happens not to
match what was used for some column value stored therein.

Dilip Kumar. The original patches on which this work was based were
written by Ildus Kurbangaliev, and those were patches were based on
even earlier work by Nikita Glukhov, but the design has since changed
very substantially, since allow a potentially large number of
compression methods that could be added and dropped on a running
system proved too problematic given some of the architectural issues
mentioned above; the choice of which specific compression method to
add first is now different; and a lot of the code has been heavily
refactored. More recently, Justin Przyby helped quite a bit with
testing and reviewing and this version also includes some code
contributions from him. Other design input and review from Tomas
Vondra, Álvaro Herrera, Andres Freund, Oleg Bartunov, Alexander
Korotkov, and me.

Discussion: http://postgr.es/m/20170907194236.4cefce96%40wp.localdomain
Discussion: http://postgr.es/m/CAFiTN-uUpX3ck%3DK0mLEk-G_kUQY%3DSNOTeqdaNRR9FMdQrHKebw%40mail.gmail.com
```

\set HIDE_TOAST_COMPRESSION false 2 -- test creating table with compression method 3 CREATE TABLE cmdata(f1 text COMPRESSION pglz); 4 CREATE INDEX idx ON cmdata(f1); 5 INSERT INTO cmdata VALUES(repeat('1234567890', 1000)); 6 \d+ cmdata 7 Table "public.cmdata" 8 Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description 9 --------+------+-----------+----------+---------+----------+-------------+--------------+------------- 10 f1 | text | | | | extended | pglz | | 11 Indexes: 12 "idx" btree (f1) 13 14 CREATE TABLE cmdata1(f1 TEXT COMPRESSION lz4); 15 INSERT INTO cmdata1 VALUES(repeat('1234567890', 1004)); 16 \d+ cmdata1 17 Table "public.cmdata1" 18 Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description 19 --------+------+-----------+----------+---------+----------+-------------+--------------+------------- 20 f1 | text | | | | extended | lz4 | | 21 22 -- verify stored compression method in the data 23 SELECT pg_column_compression(f1) FROM cmdata; 24 pg_column_compression 25 ----------------------- 26 pglz 27 (1 row) 28 29 SELECT pg_column_compression(f1) FROM cmdata1; 30 pg_column_compression 31 ----------------------- 32 lz4 33 (1 row) 34 35 -- decompress data slice 36 SELECT SUBSTR(f1, 200, 5) FROM cmdata; 37 substr 38 -------- 39 01234 40 (1 row) 41 42 SELECT SUBSTR(f1, 2000, 50) FROM cmdata1; 43 substr 44 ---------------------------------------------------- 45 01234567890123456789012345678901234567890123456789 46 (1 row) 47 48 -- copy with table creation 49 SELECT * INTO cmmove1 FROM cmdata; 50 \d+ cmmove1 51 Table "public.cmmove1" 52 Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description 53 --------+------+-----------+----------+---------+----------+-------------+--------------+------------- 54 f1 | text | | | | extended | pglz | | 55 56 SELECT pg_column_compression(f1) FROM cmmove1; 57 pg_column_compression 58 ----------------------- 59 pglz 60 (1 row) 61 62 -- copy to existing table 63 CREATE TABLE cmmove3(f1 text COMPRESSION pglz); 64 INSERT INTO cmmove3 SELECT * FROM cmdata; 65 INSERT INTO cmmove3 SELECT * FROM cmdata1; 66 SELECT pg_column_compression(f1) FROM cmmove3; 67 pg_column_compression 68 ----------------------- 69 pglz 70 lz4 71 (2 rows) 72 73 -- test LIKE INCLUDING COMPRESSION 74 CREATE TABLE cmdata2 (LIKE cmdata1 INCLUDING COMPRESSION); 75 \d+ cmdata2 76 Table "public.cmdata2" 77 Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description 78 --------+------+-----------+----------+---------+----------+-------------+--------------+------------- 79 f1 | text | | | | extended | lz4 | | 80 81 DROP TABLE cmdata2; 82 -- try setting compression for incompressible data type 83 CREATE TABLE cmdata2 (f1 int COMPRESSION pglz); 84 ERROR: column data type integer does not support compression 85 -- update using datum from different table 86 CREATE TABLE cmmove2(f1 text COMPRESSION pglz); 87 INSERT INTO cmmove2 VALUES (repeat('1234567890', 1004)); 88 SELECT pg_column_compression(f1) FROM cmmove2; 89 pg_column_compression 90 ----------------------- 91 pglz 92 (1 row) 93 94 UPDATE cmmove2 SET f1 = cmdata1.f1 FROM cmdata1; 95 SELECT pg_column_compression(f1) FROM cmmove2; 96 pg_column_compression 97 ----------------------- 98 lz4 99 (1 row) 100 101 -- test externally stored compressed data 102 CREATE OR REPLACE FUNCTION large_val() RETURNS TEXT LANGUAGE SQL AS 103 'select array_agg(md5(g::text))::text from generate_series(1, 256) g'; 104 CREATE TABLE cmdata2 (f1 text COMPRESSION pglz); 105 INSERT INTO cmdata2 SELECT large_val() || repeat('a', 4000); 106 SELECT pg_column_compression(f1) FROM cmdata2; 107 pg_column_compression 108 ----------------------- 109 pglz 110 (1 row) 111 112 INSERT INTO cmdata1 SELECT large_val() || repeat('a', 4000); 113 SELECT pg_column_compression(f1) FROM cmdata1; 114 pg_column_compression 115 ----------------------- 116 lz4 117 lz4 118 (2 rows) 119 120 SELECT SUBSTR(f1, 200, 5) FROM cmdata1; 121 substr 122 -------- 123 01234 124 8f14e 125 (2 rows) 126 127 SELECT SUBSTR(f1, 200, 5) FROM cmdata2; 128 substr 129 -------- 130 8f14e 131 (1 row) 132 133 DROP TABLE cmdata2; 134 --test column type update varlena/non-varlena 135 CREATE TABLE cmdata2 (f1 int); 136 \d+ cmdata2 137 Table "public.cmdata2" 138 Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description 139 --------+---------+-----------+----------+---------+---------+-------------+--------------+------------- 140 f1 | integer | | | | plain | | | 141 142 ALTER TABLE cmdata2 ALTER COLUMN f1 TYPE varchar; 143 \d+ cmdata2 144 Table "public.cmdata2" 145 Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description 146 --------+-------------------+-----------+----------+---------+----------+-------------+--------------+------------- 147 f1 | character varying | | | | extended | pglz | | 148 149 ALTER TABLE cmdata2 ALTER COLUMN f1 TYPE int USING f1::integer; 150 \d+ cmdata2 151 Table "public.cmdata2" 152 Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description 153 --------+---------+-----------+----------+---------+---------+-------------+--------------+------------- 154 f1 | integer | | | | plain | | | 155 156 --changing column storage should not impact the compression method 157 --but the data should not be compressed 158 ALTER TABLE cmdata2 ALTER COLUMN f1 TYPE varchar; 159 \d+ cmdata2 160 Table "public.cmdata2" 161 Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description 162 --------+-------------------+-----------+----------+---------+----------+-------------+--------------+------------- 163 f1 | character varying | | | | extended | pglz | | 164 165 ALTER TABLE cmdata2 ALTER COLUMN f1 SET STORAGE plain; 166 \d+ cmdata2 167 Table "public.cmdata2" 168 Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description 169 --------+-------------------+-----------+----------+---------+---------+-------------+--------------+------------- 170 f1 | character varying | | | | plain | pglz | | 171 172 INSERT INTO cmdata2 VALUES (repeat('123456789', 800)); 173 SELECT pg_column_compression(f1) FROM cmdata2; 174 pg_column_compression 175 ----------------------- 176 177 (1 row) 178 179 -- test compression with materialized view 180 CREATE MATERIALIZED VIEW mv(x) AS SELECT * FROM cmdata1; 181 \d+ mv 182 Materialized view "public.mv" 183 Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description 184 --------+------+-----------+----------+---------+----------+-------------+--------------+------------- 185 x | text | | | | extended | pglz | | 186 View definition: 187 SELECT cmdata1.f1 AS x 188 FROM cmdata1; 189 190 SELECT pg_column_compression(f1) FROM cmdata1; 191 pg_column_compression 192 ----------------------- 193 lz4 194 lz4 195 (2 rows) 196 197 SELECT pg_column_compression(x) FROM mv; 198 pg_column_compression 199 ----------------------- 200 lz4 201 lz4 202 (2 rows) 203 204 -- test compression with partition 205 CREATE TABLE cmpart(f1 text COMPRESSION lz4) PARTITION BY HASH(f1); 206 CREATE TABLE cmpart1 PARTITION OF cmpart FOR VALUES WITH (MODULUS 2, REMAINDER 0); 207 CREATE TABLE cmpart2(f1 text COMPRESSION pglz); 208 ALTER TABLE cmpart ATTACH PARTITION cmpart2 FOR VALUES WITH (MODULUS 2, REMAINDER 1); 209 INSERT INTO cmpart VALUES (repeat('123456789', 1004)); 210 INSERT INTO cmpart VALUES (repeat('123456789', 4004)); 211 SELECT pg_column_compression(f1) FROM cmpart1; 212 pg_column_compression 213 ----------------------- 214 lz4 215 (1 row) 216 217 SELECT pg_column_compression(f1) FROM cmpart2; 218 pg_column_compression 219 ----------------------- 220 pglz 221 (1 row) 222 223 -- test compression with inheritence, error 224 CREATE TABLE cminh() INHERITS(cmdata, cmdata1); 225 NOTICE: merging multiple inherited definitions of column "f1" 226 ERROR: column "f1" has a compression method conflict 227 DETAIL: pglz versus lz4 228 CREATE TABLE cminh(f1 TEXT COMPRESSION lz4) INHERITS(cmdata); 229 NOTICE: merging column "f1" with inherited definition 230 ERROR: column "f1" has a compression method conflict 231 DETAIL: pglz versus lz4 232 -- test default_toast_compression GUC 233 SET default_toast_compression = ''; 234 ERROR: invalid value for parameter "default_toast_compression": "" 235 DETAIL: default_toast_compression cannot be empty. 236 SET default_toast_compression = 'I do not exist compression'; 237 ERROR: invalid value for parameter "default_toast_compression": "I do not exist compression" 238 DETAIL: Compression method "I do not exist compression" does not exist. 239 SET default_toast_compression = 'lz4'; 240 DROP TABLE cmdata2; 241 CREATE TABLE cmdata2 (f1 text); 242 \d+ cmdata2 243 Table "public.cmdata2" 244 Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description 245 --------+------+-----------+----------+---------+----------+-------------+--------------+------------- 246 f1 | text | | | | extended | lz4 | | 247 248 -- test alter compression method 249 ALTER TABLE cmdata ALTER COLUMN f1 SET COMPRESSION lz4; 250 INSERT INTO cmdata VALUES (repeat('123456789', 4004)); 251 \d+ cmdata 252 Table "public.cmdata" 253 Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description 254 --------+------+-----------+----------+---------+----------+-------------+--------------+------------- 255 f1 | text | | | | extended | lz4 | | 256 Indexes: 257 "idx" btree (f1) 258 259 SELECT pg_column_compression(f1) FROM cmdata; 260 pg_column_compression 261 ----------------------- 262 pglz 263 lz4 264 (2 rows) 265 266 -- test alter compression method for the materialized view 267 ALTER MATERIALIZED VIEW mv ALTER COLUMN x SET COMPRESSION lz4; 268 \d+ mv 269 Materialized view "public.mv" 270 Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description 271 --------+------+-----------+----------+---------+----------+-------------+--------------+------------- 272 x | text | | | | extended | lz4 | | 273 View definition: 274 SELECT cmdata1.f1 AS x 275 FROM cmdata1; 276 277 -- test alter compression method for the partitioned table 278 ALTER TABLE cmpart1 ALTER COLUMN f1 SET COMPRESSION pglz; 279 ALTER TABLE cmpart2 ALTER COLUMN f1 SET COMPRESSION lz4; 280 -- new data should be compressed with the current compression method 281 INSERT INTO cmpart VALUES (repeat('123456789', 1004)); 282 INSERT INTO cmpart VALUES (repeat('123456789', 4004)); 283 SELECT pg_column_compression(f1) FROM cmpart1; 284 pg_column_compression 285 ----------------------- 286 lz4 287 pglz 288 (2 rows) 289 290 SELECT pg_column_compression(f1) FROM cmpart2; 291 pg_column_compression 292 ----------------------- 293 pglz 294 lz4 295 (2 rows) 296 297 --vacuum full to recompress the data 298 SELECT pg_column_compression(f1) FROM cmdata; 299 pg_column_compression 300 ----------------------- 301 pglz 302 lz4 303 (2 rows) 304 305 VACUUM FULL cmdata; 306 SELECT pg_column_compression(f1) FROM cmdata; 307 pg_column_compression 308 ----------------------- 309 lz4 310 lz4 311 (2 rows) 312 313 -- check data is ok 314 SELECT length(f1) FROM cmdata; 315 length 316 -------- 317 10000 318 36036 319 (2 rows) 320 321 SELECT length(f1) FROM cmdata1; 322 length 323 -------- 324 10040 325 12449 326 (2 rows) 327 328 SELECT length(f1) FROM cmmove1; 329 length 330 -------- 331 10000 332 (1 row) 333 334 SELECT length(f1) FROM cmmove2; 335 length 336 -------- 337 10040 338 (1 row) 339 340 SELECT length(f1) FROM cmmove3; 341 length 342 -------- 343 10000 344 10040 345 (2 rows) 346 347 \set HIDE_TOAST_COMPRESSION true

PostgreSQL 许愿链接

您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。开不开森.

9.9元购买3个月阿里云RDS PostgreSQL实例

PostgreSQL 解决方案集合

德哥 / digoal's github - 公益是一辈子的事.

digoal's wechat

文章转载自digoal,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论