PostgreSQL：TOAST压缩和toast_tuple_target

飞象数据 2023-01-16

1330

PostgreSQL 11 中出现了一些关于 TOAST 的提醒和一个新特性。

什么是TOAST？

你有没有想过 Postgres 是如何存储超过块大小的行的？提醒一下，默认块大小为 8KB。

Postgres 使用一种称为 TOAST 的机制来实现超大属性存储技术。

当记录变得太大而无法存储在块中时，Postgres 会将其存储在“旁边”，在toast表中。记录将被分成块，因此主表（称为堆）将包含一个指针（chunk_id）指向 toast 表中的正确块。

这个块将存储在几行中，对于一个 chunk_id 我们可以在这个 toast 表中有几行。因此，这个 toast 表由 3 列组成：

chunk_id : 堆中引用的块号
chunk_seq : 一个块的每个段的编号
chunk_data : 每个段的数据部分

实际情况有点复杂，事实上 Postgres 会尽量避免将数据存储在 toast 表中。如果行超过TOAST_TUPLE_THRESHOLD
(2Kb)，它将尝试压缩列以尝试将行放入块中。更准确地说，大小必须小于TOAST_TUPLE_TARGET
（默认为 2Kb，稍后我们将讨论）。

如果幸运的话，压缩的行将适合堆。如果没有，它将尝试压缩列，从最大到最小，并将它们存储在 toast 部分中，直到剩余的列适合堆的一行。1个

还要注意，如果压缩增益太小，它认为花费资源尝试压缩是没有用的。因此，它不压缩地存储数据。

你在用meta命令\d+ table
显示表的特性时有没有注意过“Storage”这一列？

stackoverflow=# \d+ posts
                   Table "public.posts"
    Column     |  Type   | Collation | Nullable | Default | Storage  |
---------------+---------+-----------+----------+---------+----------+
 id            | integer |           | not null |         | plain    |
 posttypeid    | integer |           | not null |         | plain    |
 score         | integer |           |          |         | plain    |
 viewcount     | integer |           |          |         | plain    |
 body          | text    |           |          |         | extended |
复制

在此示例中，该列采用值plain或extended。事实上，根据数据类型，有 4 种可能的值：

plain：该列仅存储在堆中且未压缩。
extended：如有必要，可以压缩该列并将其存储在 toast 中。
external：该列可以存储在 toast 中，但不能压缩。有时，可以使用此模式以更高的磁盘空间消耗为代价来提高性能（避免压缩/解压缩）。
main：该列仅存储在堆中，但与普通模式不同，允许压缩。

乍一看，我们可能认为优势主要在于有机会存储超过块大小的行，并压缩数据以节省磁盘空间。

还有一个好处：当更新一行时，如果“toasted”列没有被修改，Postgres 不需要更新 toast 表。这避免了必须解压缩和重新压缩 toast 并将所有这些写入事务日志。

我们将看到另一个优点是 Postgres 可以在没有必要的情况下避免读取 toast。

JSONB 示例

为了研究这一点，我们将使用 JSONB 类型。一般来说，我不推荐使用这种类型：

您失去了拥有架构的优势：

类型检查
完整性约束
没有外键

编写查询变得更加复杂
没有统计某个json字段的key
当我们为每一行存储键时存储效率的损失
没有 JSONB 的部分更新。如果您更改密钥，则必须对整个 JSONB进行detoast和toast
没有部分detoast：如果我们只想读取 JSONB 的一个键，我们将不得不detoast整个 JSONB 3

但是，在一些例外情况下 JSON 可能很有用：

当我们不需要搜索多个字段并通过另一列检索 json 时。（我们不需要 json 的键的统计信息）。
而且，当很难将 json 放入关系模式时。有些情况会涉及到有很多列，其中大部分位于NULL
.

例如，要存储产品特性，其中规范化版本意味着使用大量列，其中大部分是NULL
. 假设您要存储产品，电视将具有特定的特性（屏幕类型、尺寸等）。洗衣机还具有其他特定特征（旋转速度、可接受的重量……）。

因此，我们可以考虑使用“正常”列，包括型号、价格、参考等，以及包含所有特征的列。我们将通过引用访问该行，因此我们将恢复存储在 json 中的产品的所有特征。

我将通过将一些列移动到 jsonb 列（本例中为 jsonfield 列）来重用 Stackoverflow posts表：

\d posts
                            Unlogged table "public.posts"
        Column         |            Type             | Collation | Nullable | Default
-----------------------+-----------------------------+-----------+----------+---------
 id                    | integer                     |           | not null |
 posttypeid            | integer                     |           | not null |
 acceptedanswerid      | integer                     |           |          |
 parentid              | integer                     |           |          |
 creationdate          | timestamp without time zone |           | not null |
 score                 | integer                     |           |          |
 viewcount             | integer                     |           |          |
 body                  | text                        |           |          |
 owneruserid           | integer                     |           |          |
 lasteditoruserid      | integer                     |           |          |
 lasteditordisplayname | text                        |           |          |
 lasteditdate          | timestamp without time zone |           |          |
 lastactivitydate      | timestamp without time zone |           |          |
 title                 | text                        |           |          |
 tags                  | text                        |           |          |
 answercount           | integer                     |           |          |
 commentcount          | integer                     |           |          |
 favoritecount         | integer                     |           |          |
 closeddate            | timestamp without time zone |           |          |
 communityowneddate    | timestamp without time zone |           |          |
 jsonfield             | jsonb
复制

这是一个简单的聚合：

SELECT
  avg(viewcount),
  avg(answercount),
  avg(commentcount),
  avg(favoritecount)
FROM posts;
                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=10265135.77..10265135.78 rows=1 width=128) (actual time=170221.557..170221.558 rows=1 loops=1)
   Buffers: shared hit=1 read=9186137
   I/O Timings: read=138022.290
   ->  Seq Scan on posts  (cost=0.00..9725636.88 rows=53949888 width=16) (actual time=0.014..153665.913 rows=53949886 loops=1)
         Buffers: shared hit=1 read=9186137
         I/O Timings: read=138022.290
 Planning Time: 0.240 ms
 Execution Time: 170221.627 ms
(8 rows)
复制

查询读取 70 GB，执行大约需要 2 分 50 秒。

现在是相同的查询，但这次使用的是 json 中存在的键。

SELECT
  avg((jsonfield ->> 'ViewCount')::int),
  avg((jsonfield ->> 'AnswerCount')::int),
  avg((jsonfield ->> 'CommentCount')::int),
  avg((jsonfield ->> 'FavoriteCount')::int)
FROM posts;
                           QUERY PLAN
------------------------------------------------------------------------------
 Aggregate  (cost=11883632.41..11883632.42 rows=1 width=128)
            (actual time=520917.028..520917.030 rows=1 loops=1)
   Buffers: shared hit=241116554 read=13625756
   ->  Seq Scan on posts  (cost=0.00..9725636.88 rows=53949888 width=570)
                          (actual time=0.972..70569.365 rows=53949886 loops=1)
         Buffers: shared read=9186138
 Planning Time: 0.118 ms
 Execution Time: 520945.395 ms
(10 rows)
复制

执行查询大约需要 8 分钟 40 秒。然而，读取的块数似乎有点疯狂：

Seq Scan 和以前一样指示 70Gb。但是，父节点指示读取超过 1.9 TB！

这是具有默认设置的表的大小。您应该知道，对于某些记录，Postgres 要么将行压缩到堆中（内联压缩），要么将其压缩并放入 toast 中。

SELECT
  pg_size_pretty(pg_relation_size(oid)) table_size,
  pg_size_pretty(pg_relation_size(reltoastrelid)) toast_size
FROM pg_class
WHERE relname = 'posts';


 table_size | toast_size
------------+-----------
 70 GB      | 33 GB
(1 row)
复制

1.9 TB 读取如何解释？

出于好奇，我进行了相同的查询，但通过一次聚合，我得到了大约 538 GB。

有几个问题：

我怎么知道 Postgres 是否会读 toast？
为什么“标准列”版本和 jsonb 字段之间的执行时间差异如此之大？
节点中的计数器Aggregate
对应什么？

要回答第一个问题，只需阅读pg_statio_user_tables
视图即可。

在执行查询之前：

select relid,schemaname,relname,heap_blks_read,heap_blks_hit,toast_blks_read,toast_blks_hit from pg_statio_all_tables where relname in ('posts','pg_toast_26180851');
  relid   | schemaname |      relname      | heap_blks_read | heap_blks_hit | toast_blks_read | toast_blks_hit
----------+------------+-------------------+----------------+---------------+-----------------+----------------
 26180851 | public     | posts             |      422018238 |      87673549 |       129785076 |      628153337
 26180854 | pg_toast   | pg_toast_26180851 |      129785076 |     628153337 |                 |
(2 rows)
复制

之后

  relid   | schemaname |      relname      | heap_blks_read | heap_blks_hit | toast_blks_read | toast_blks_hit
----------+------------+-------------------+----------------+---------------+-----------------+----------------
 26180851 | public     | posts             |      431204376 |      87673549 |       134156898 |      686299551
 26180854 | pg_toast   | pg_toast_26180851 |      134156898 |     686299551 |                 |
(2 rows)
复制

这给了我们：

SELECT
pg_size_pretty(
    ((431204376 + 87673549) - (422018238 + 87673549) ) * 8*1024::bigint
) heap_buffers,
pg_size_pretty(
    ((134156898 + 686299551) - (129785076 + 628153337) ) * 8*1024::bigint
) toast_buffers;


 heap_buffers | toast_buffers
--------------+---------------
 70 GB        | 477 GB
(1 row)
复制

Postgres 读取toast。但是，计数器表明 Postgres 会多次读取 toast。

如果我做同样的计算，但这次只在一个字段上聚合，我得到 119 GB (~ 477 GB 4) 我猜 Postgres 会为每个函数读取 toast。

那么，执行时间的差异是由于以下几个原因：

Postgres 将不得不读取和detoast（解压缩）toast
对 jsonb 执行其他操作以访问该值

对于第一个查询，Postgres 不必阅读 toast。一方面，它读取的数据更少，另一方面，它不必操作json来识别键并提取要计算的值。

最后，聚合节点的计数器必须对应于将读取 json 的每个函数的解压缩数据。事实上，如果我们将聚合减去表的seqscan，那么toast部分，我们有：

单个字段 468 GB
936 GB，两个字段加倍
4 个字段 1873 GB（大约 4 x 468 GB）

这就解释了为什么价值如此之高。

高级设置

现在，由于Postgres 版本 11 引入的toast_tuple_target选项，我们将鼓励 Postgres 将最大量的数据放入 toast 中。

此选项允许您控制数据存储在toast中的阈值。

此外，在 Postgres 14 下，我借此机会使用了 lz4 压缩算法（参数default_toast_compression）。该算法提供与 pglz 相似的压缩率，但速度要快得多（请参阅什么是 PostgreSQL 14 中新的 LZ4 TOAST 压缩，它有多快？）。

CREATE TABLE posts_toast
  WITH (toast_tuple_target = 128) AS
    SELECT *
    FROM posts;
复制

这是表和toast 表的大小：

SELECT
  pg_size_pretty(pg_relation_size(oid)) table_size,
  pg_size_pretty(pg_relation_size(reltoastrelid)) toast_size
FROM pg_class
WHERE relname = 'posts_toast';


 table_size | toast_size
------------+------------
 59 GB      | 52 GB
复制

总的来说，放toast 表大小差不多。在第一个表的示例中，您应该记住引擎还会压缩堆中的数据。

让我们再次进行聚合查询：

SELECT
  avg(viewcount),
  avg(answercount),
  avg(commentcount),
  avg(favoritecount)
FROM posts_toast;
复制

这次查询读取 59 GB，耗时 2 分 17 秒。我们在此示例中节省了大约 20% 的执行时间。

如果 toast 中存储的部分更大，我们可以节省更多。在堆中读取的数据量会小得多。

出于好奇，我还执行了从 json 字段聚合数据的查询。我的执行时间为 7 分 17 秒。

结论

总结几个数字：

标准聚合，标准存储：2min 50s
聚合类型jsonb，标准存储：8min 40s
标准聚合，使用toast_tuple_target = 128 存储：2分钟 17 秒
聚合类型 jsonb，使用toast_tuple_target = 128 存储：7 分钟 17 秒

我们可以看到使用 JSON 比使用标准类型要昂贵得多。Postgres 必须执行更多操作才能访问 json 键的值。

此外，它必须解压缩 toast 中的数据才能访问它。但是，我们也可以使用toast_tuple_target
参数在 toast 中推送更多信息。因此，在某些情况下，这可以通过避免读取 toast 来减少读取的数据量。

文章转载自飞象数据，如果涉嫌侵权，请发送邮件至：contact@modb.pro进行举报，并提供相关证据，一经查实，墨天轮将立刻删除相关内容。

PostgreSQL：TOAST压缩和toast_tuple_target

什么是TOAST？

JSONB 示例

高级设置

结论

评论

相关阅读