作者
digoal
日期
2021-03-20
标签
PostgreSQL , default_toast_compression , lz4 , 压缩 , toast
背景
PostgreSQL Heap table 支持列压缩(指某行的可变长度列, 最大1GB), 当存储在某行的可变长度列的空间占用超过TOAST_TUPLE_THRESHOLD时, 可变列的内容将压缩后存储在这个表对应的toast表中.
压缩方法默认为pglz, PostgreSQL 14支持lz4算法, 要支持lz4必须在编译postgresql时选择 --with-lz4.
pglz压缩比lz4省cpu, 而lz4比pglz压缩比高.
create table, alter table可以指定表的压缩算法, 也能设置GUC参数default_toast_compression指定默认压缩算法.
pg_column_compression(column) 可以查询当前行的指定列使用了什么压缩算法.
PSQL \d+ 表名
可以查询表的压缩算法
GUC参数default_toast_compression查询当前默认压缩算法
注意:
为了提高效率, 在使用alter table修改压缩算法后, 已压缩的内容不会改变, 例如 CREATE TABLE AS or INSERT .. SELECT , 原有记录的压缩算法保持不变, 新增的记录才会使用新压缩算法.
而VACUUM FULL or CLUSTER 操作会使用新压缩算法对所有记录重算.
更多toast相关内容参考:
《PostgreSQL 14 preview - vacuum PROCESS_TOAST 开关 (是否vacuum 相关 toast)》
《PostgreSQL 14 preview - psql客户端增强, 支持desc toast表和索引》
《PostgreSQL 11 preview - TOAST切片阈值表级可配置》
《HOW to Change PostgreSQL's TOAST_TUPLE_THRESHOLD》
《how difference when update a table's column which it in TOAST or BASETABLE(in-line)》
《TOAST,The Oversized-Attribute Storage Technique - 暨存储格式main, extended, external, plain介绍》
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=bbe0a81db69bd10bd166907c3701492a29aca294
```
Allow configurable LZ4 TOAST compression.
author Robert Haas rhaas@postgresql.org
Fri, 19 Mar 2021 19:10:38 +0000 (15:10 -0400)
committer Robert Haas rhaas@postgresql.org
Fri, 19 Mar 2021 19:10:38 +0000 (15:10 -0400)
commit bbe0a81db69bd10bd166907c3701492a29aca294
tree 13d4a0dd3309af4f7970a1edbb61da0d974a2448 tree
parent e589c4890b05044a04207c2797e7c8af6693ea5f commit | diff
Allow configurable LZ4 TOAST compression.
There is now a per-column COMPRESSION option which can be set to pglz
(the default, and the only option in up until now) or lz4. Or, if you
like, you can set the new default_toast_compression GUC to lz4, and
then that will be the default for new table columns for which no value
is specified. We don't have lz4 support in the PostgreSQL code, so
to use lz4 compression, PostgreSQL must be built --with-lz4.
In general, TOAST compression means compression of individual column
values, not the whole tuple, and those values can either be compressed
inline within the tuple or compressed and then stored externally in
the TOAST table, so those properties also apply to this feature.
Prior to this commit, a TOAST pointer has two unused bits as part of
the va_extsize field, and a compessed datum has two unused bits as
part of the va_rawsize field. These bits are unused because the length
of a varlena is limited to 1GB; we now use them to indicate the
compression type that was used. This means we only have bit space for
2 more built-in compresison types, but we could work around that
problem, if necessary, by introducing a new vartag_external value for
any further types we end up wanting to add. Hopefully, it won't be
too important to offer a wide selection of algorithms here, since
each one we add not only takes more coding but also adds a build
dependency for every packager. Nevertheless, it seems worth doing
at least this much, because LZ4 gets better compression than PGLZ
with less CPU usage.
It's possible for LZ4-compressed datums to leak into composite type
values stored on disk, just as it is for PGLZ. It's also possible for
LZ4-compressed attributes to be copied into a different table via SQL
commands such as CREATE TABLE AS or INSERT .. SELECT. It would be
expensive to force such values to be decompressed, so PostgreSQL has
never done so. For the same reasons, we also don't force recompression
of already-compressed values even if the target table prefers a
different compression method than was used for the source data. These
architectural decisions are perhaps arguable but revisiting them is
well beyond the scope of what seemed possible to do as part of this
project. However, it's relatively cheap to recompress as part of
VACUUM FULL or CLUSTER, so this commit adjusts those commands to do
so, if the configured compression method of the table happens not to
match what was used for some column value stored therein.
Dilip Kumar. The original patches on which this work was based were
written by Ildus Kurbangaliev, and those were patches were based on
even earlier work by Nikita Glukhov, but the design has since changed
very substantially, since allow a potentially large number of
compression methods that could be added and dropped on a running
system proved too problematic given some of the architectural issues
mentioned above; the choice of which specific compression method to
add first is now different; and a lot of the code has been heavily
refactored. More recently, Justin Przyby helped quite a bit with
testing and reviewing and this version also includes some code
contributions from him. Other design input and review from Tomas
Vondra, Álvaro Herrera, Andres Freund, Oleg Bartunov, Alexander
Korotkov, and me.
Discussion: http://postgr.es/m/20170907194236.4cefce96%40wp.localdomain
Discussion: http://postgr.es/m/CAFiTN-uUpX3ck%3DK0mLEk-G_kUQY%3DSNOTeqdaNRR9FMdQrHKebw%40mail.gmail.com
```
\set HIDE_TOAST_COMPRESSION false
2 -- test creating table with compression method
3 CREATE TABLE cmdata(f1 text COMPRESSION pglz);
4 CREATE INDEX idx ON cmdata(f1);
5 INSERT INTO cmdata VALUES(repeat('1234567890', 1000));
6 \d+ cmdata
7 Table "public.cmdata"
8 Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
9 --------+------+-----------+----------+---------+----------+-------------+--------------+-------------
10 f1 | text | | | | extended | pglz | |
11 Indexes:
12 "idx" btree (f1)
13
14 CREATE TABLE cmdata1(f1 TEXT COMPRESSION lz4);
15 INSERT INTO cmdata1 VALUES(repeat('1234567890', 1004));
16 \d+ cmdata1
17 Table "public.cmdata1"
18 Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
19 --------+------+-----------+----------+---------+----------+-------------+--------------+-------------
20 f1 | text | | | | extended | lz4 | |
21
22 -- verify stored compression method in the data
23 SELECT pg_column_compression(f1) FROM cmdata;
24 pg_column_compression
25 -----------------------
26 pglz
27 (1 row)
28
29 SELECT pg_column_compression(f1) FROM cmdata1;
30 pg_column_compression
31 -----------------------
32 lz4
33 (1 row)
34
35 -- decompress data slice
36 SELECT SUBSTR(f1, 200, 5) FROM cmdata;
37 substr
38 --------
39 01234
40 (1 row)
41
42 SELECT SUBSTR(f1, 2000, 50) FROM cmdata1;
43 substr
44 ----------------------------------------------------
45 01234567890123456789012345678901234567890123456789
46 (1 row)
47
48 -- copy with table creation
49 SELECT * INTO cmmove1 FROM cmdata;
50 \d+ cmmove1
51 Table "public.cmmove1"
52 Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
53 --------+------+-----------+----------+---------+----------+-------------+--------------+-------------
54 f1 | text | | | | extended | pglz | |
55
56 SELECT pg_column_compression(f1) FROM cmmove1;
57 pg_column_compression
58 -----------------------
59 pglz
60 (1 row)
61
62 -- copy to existing table
63 CREATE TABLE cmmove3(f1 text COMPRESSION pglz);
64 INSERT INTO cmmove3 SELECT * FROM cmdata;
65 INSERT INTO cmmove3 SELECT * FROM cmdata1;
66 SELECT pg_column_compression(f1) FROM cmmove3;
67 pg_column_compression
68 -----------------------
69 pglz
70 lz4
71 (2 rows)
72
73 -- test LIKE INCLUDING COMPRESSION
74 CREATE TABLE cmdata2 (LIKE cmdata1 INCLUDING COMPRESSION);
75 \d+ cmdata2
76 Table "public.cmdata2"
77 Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
78 --------+------+-----------+----------+---------+----------+-------------+--------------+-------------
79 f1 | text | | | | extended | lz4 | |
80
81 DROP TABLE cmdata2;
82 -- try setting compression for incompressible data type
83 CREATE TABLE cmdata2 (f1 int COMPRESSION pglz);
84 ERROR: column data type integer does not support compression
85 -- update using datum from different table
86 CREATE TABLE cmmove2(f1 text COMPRESSION pglz);
87 INSERT INTO cmmove2 VALUES (repeat('1234567890', 1004));
88 SELECT pg_column_compression(f1) FROM cmmove2;
89 pg_column_compression
90 -----------------------
91 pglz
92 (1 row)
93
94 UPDATE cmmove2 SET f1 = cmdata1.f1 FROM cmdata1;
95 SELECT pg_column_compression(f1) FROM cmmove2;
96 pg_column_compression
97 -----------------------
98 lz4
99 (1 row)
100
101 -- test externally stored compressed data
102 CREATE OR REPLACE FUNCTION large_val() RETURNS TEXT LANGUAGE SQL AS
103 'select array_agg(md5(g::text))::text from generate_series(1, 256) g';
104 CREATE TABLE cmdata2 (f1 text COMPRESSION pglz);
105 INSERT INTO cmdata2 SELECT large_val() || repeat('a', 4000);
106 SELECT pg_column_compression(f1) FROM cmdata2;
107 pg_column_compression
108 -----------------------
109 pglz
110 (1 row)
111
112 INSERT INTO cmdata1 SELECT large_val() || repeat('a', 4000);
113 SELECT pg_column_compression(f1) FROM cmdata1;
114 pg_column_compression
115 -----------------------
116 lz4
117 lz4
118 (2 rows)
119
120 SELECT SUBSTR(f1, 200, 5) FROM cmdata1;
121 substr
122 --------
123 01234
124 8f14e
125 (2 rows)
126
127 SELECT SUBSTR(f1, 200, 5) FROM cmdata2;
128 substr
129 --------
130 8f14e
131 (1 row)
132
133 DROP TABLE cmdata2;
134 --test column type update varlena/non-varlena
135 CREATE TABLE cmdata2 (f1 int);
136 \d+ cmdata2
137 Table "public.cmdata2"
138 Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
139 --------+---------+-----------+----------+---------+---------+-------------+--------------+-------------
140 f1 | integer | | | | plain | | |
141
142 ALTER TABLE cmdata2 ALTER COLUMN f1 TYPE varchar;
143 \d+ cmdata2
144 Table "public.cmdata2"
145 Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
146 --------+-------------------+-----------+----------+---------+----------+-------------+--------------+-------------
147 f1 | character varying | | | | extended | pglz | |
148
149 ALTER TABLE cmdata2 ALTER COLUMN f1 TYPE int USING f1::integer;
150 \d+ cmdata2
151 Table "public.cmdata2"
152 Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
153 --------+---------+-----------+----------+---------+---------+-------------+--------------+-------------
154 f1 | integer | | | | plain | | |
155
156 --changing column storage should not impact the compression method
157 --but the data should not be compressed
158 ALTER TABLE cmdata2 ALTER COLUMN f1 TYPE varchar;
159 \d+ cmdata2
160 Table "public.cmdata2"
161 Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
162 --------+-------------------+-----------+----------+---------+----------+-------------+--------------+-------------
163 f1 | character varying | | | | extended | pglz | |
164
165 ALTER TABLE cmdata2 ALTER COLUMN f1 SET STORAGE plain;
166 \d+ cmdata2
167 Table "public.cmdata2"
168 Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
169 --------+-------------------+-----------+----------+---------+---------+-------------+--------------+-------------
170 f1 | character varying | | | | plain | pglz | |
171
172 INSERT INTO cmdata2 VALUES (repeat('123456789', 800));
173 SELECT pg_column_compression(f1) FROM cmdata2;
174 pg_column_compression
175 -----------------------
176
177 (1 row)
178
179 -- test compression with materialized view
180 CREATE MATERIALIZED VIEW mv(x) AS SELECT * FROM cmdata1;
181 \d+ mv
182 Materialized view "public.mv"
183 Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
184 --------+------+-----------+----------+---------+----------+-------------+--------------+-------------
185 x | text | | | | extended | pglz | |
186 View definition:
187 SELECT cmdata1.f1 AS x
188 FROM cmdata1;
189
190 SELECT pg_column_compression(f1) FROM cmdata1;
191 pg_column_compression
192 -----------------------
193 lz4
194 lz4
195 (2 rows)
196
197 SELECT pg_column_compression(x) FROM mv;
198 pg_column_compression
199 -----------------------
200 lz4
201 lz4
202 (2 rows)
203
204 -- test compression with partition
205 CREATE TABLE cmpart(f1 text COMPRESSION lz4) PARTITION BY HASH(f1);
206 CREATE TABLE cmpart1 PARTITION OF cmpart FOR VALUES WITH (MODULUS 2, REMAINDER 0);
207 CREATE TABLE cmpart2(f1 text COMPRESSION pglz);
208 ALTER TABLE cmpart ATTACH PARTITION cmpart2 FOR VALUES WITH (MODULUS 2, REMAINDER 1);
209 INSERT INTO cmpart VALUES (repeat('123456789', 1004));
210 INSERT INTO cmpart VALUES (repeat('123456789', 4004));
211 SELECT pg_column_compression(f1) FROM cmpart1;
212 pg_column_compression
213 -----------------------
214 lz4
215 (1 row)
216
217 SELECT pg_column_compression(f1) FROM cmpart2;
218 pg_column_compression
219 -----------------------
220 pglz
221 (1 row)
222
223 -- test compression with inheritence, error
224 CREATE TABLE cminh() INHERITS(cmdata, cmdata1);
225 NOTICE: merging multiple inherited definitions of column "f1"
226 ERROR: column "f1" has a compression method conflict
227 DETAIL: pglz versus lz4
228 CREATE TABLE cminh(f1 TEXT COMPRESSION lz4) INHERITS(cmdata);
229 NOTICE: merging column "f1" with inherited definition
230 ERROR: column "f1" has a compression method conflict
231 DETAIL: pglz versus lz4
232 -- test default_toast_compression GUC
233 SET default_toast_compression = '';
234 ERROR: invalid value for parameter "default_toast_compression": ""
235 DETAIL: default_toast_compression cannot be empty.
236 SET default_toast_compression = 'I do not exist compression';
237 ERROR: invalid value for parameter "default_toast_compression": "I do not exist compression"
238 DETAIL: Compression method "I do not exist compression" does not exist.
239 SET default_toast_compression = 'lz4';
240 DROP TABLE cmdata2;
241 CREATE TABLE cmdata2 (f1 text);
242 \d+ cmdata2
243 Table "public.cmdata2"
244 Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
245 --------+------+-----------+----------+---------+----------+-------------+--------------+-------------
246 f1 | text | | | | extended | lz4 | |
247
248 -- test alter compression method
249 ALTER TABLE cmdata ALTER COLUMN f1 SET COMPRESSION lz4;
250 INSERT INTO cmdata VALUES (repeat('123456789', 4004));
251 \d+ cmdata
252 Table "public.cmdata"
253 Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
254 --------+------+-----------+----------+---------+----------+-------------+--------------+-------------
255 f1 | text | | | | extended | lz4 | |
256 Indexes:
257 "idx" btree (f1)
258
259 SELECT pg_column_compression(f1) FROM cmdata;
260 pg_column_compression
261 -----------------------
262 pglz
263 lz4
264 (2 rows)
265
266 -- test alter compression method for the materialized view
267 ALTER MATERIALIZED VIEW mv ALTER COLUMN x SET COMPRESSION lz4;
268 \d+ mv
269 Materialized view "public.mv"
270 Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
271 --------+------+-----------+----------+---------+----------+-------------+--------------+-------------
272 x | text | | | | extended | lz4 | |
273 View definition:
274 SELECT cmdata1.f1 AS x
275 FROM cmdata1;
276
277 -- test alter compression method for the partitioned table
278 ALTER TABLE cmpart1 ALTER COLUMN f1 SET COMPRESSION pglz;
279 ALTER TABLE cmpart2 ALTER COLUMN f1 SET COMPRESSION lz4;
280 -- new data should be compressed with the current compression method
281 INSERT INTO cmpart VALUES (repeat('123456789', 1004));
282 INSERT INTO cmpart VALUES (repeat('123456789', 4004));
283 SELECT pg_column_compression(f1) FROM cmpart1;
284 pg_column_compression
285 -----------------------
286 lz4
287 pglz
288 (2 rows)
289
290 SELECT pg_column_compression(f1) FROM cmpart2;
291 pg_column_compression
292 -----------------------
293 pglz
294 lz4
295 (2 rows)
296
297 --vacuum full to recompress the data
298 SELECT pg_column_compression(f1) FROM cmdata;
299 pg_column_compression
300 -----------------------
301 pglz
302 lz4
303 (2 rows)
304
305 VACUUM FULL cmdata;
306 SELECT pg_column_compression(f1) FROM cmdata;
307 pg_column_compression
308 -----------------------
309 lz4
310 lz4
311 (2 rows)
312
313 -- check data is ok
314 SELECT length(f1) FROM cmdata;
315 length
316 --------
317 10000
318 36036
319 (2 rows)
320
321 SELECT length(f1) FROM cmdata1;
322 length
323 --------
324 10040
325 12449
326 (2 rows)
327
328 SELECT length(f1) FROM cmmove1;
329 length
330 --------
331 10000
332 (1 row)
333
334 SELECT length(f1) FROM cmmove2;
335 length
336 --------
337 10040
338 (1 row)
339
340 SELECT length(f1) FROM cmmove3;
341 length
342 --------
343 10000
344 10040
345 (2 rows)
346
347 \set HIDE_TOAST_COMPRESSION true
PostgreSQL 许愿链接
您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。开不开森.