PostgreSQL 13 特性解读
作者
digoal
日期
2020-06-05
标签
PostgreSQL , 13 , 新特性
背景
PostgreSQL 13 特性解读
https://www.postgresql.org/docs/13/release-13.html
1
Allow partitionwise joins to happen in more cases (Ashutosh Bapat, Etsuro Fujita, Amit Langote, Tom Lane)
For example, partitionwise joins can now happen between partitioned tables even when their partition bounds do not match exactly.
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=c8434d64ce03c32e0029417a82ae937f2055268f
Previously, the partitionwise join technique only allowed partitionwise
join when input partitioned tables had exactly the same partition
bounds. This commit extends the technique to some cases when the tables
have different partition bounds, by using an advanced partition-matching
algorithm introduced by this commit. For both the input partitioned
tables, the algorithm checks whether every partition of one input
partitioned table only matches one partition of the other input
partitioned table at most, and vice versa. In such a case the join
between the tables can be broken down into joins between the matching
partitions, so the algorithm produces the pairs of the matching
partitions, plus the partition bounds for the join relation, to allow
partitionwise join for computing the join. Currently, the algorithm
works for list-partitioned and range-partitioned tables, but not
hash-partitioned tables. See comments in partition_bounds_merge().
2
Allow partitioned tables to be logically replicated via publications (Amit Langote)
Previously, partitions had to be replicated individually. Now partitioned tables can be published explicitly causing all partitions to be automatically published. Addition/removal of partitions from partitioned tables are automatically added/removed from publications. The CREATE PUBLICATION option publish_via_partition_root controls whether changes to partitions are published as their own or their ancestor’s.
Allow logical replication into partitioned tables on subscribers (Amit Langote)
Previously, subscribers could only receive rows into non-partitioned tables.
3
More efficiently store duplicates in btree indexes (Anastasia Lubennikova, Peter Geoghegan)
This allows efficient btree indexing of low cardinality columns by storing duplicate keys only once. Users upgrading with pg_upgrade will need to use REINDEX to make use of this feature.
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=0d861bbb702f8aa05c2a4e3f1650e7e8df8c8c27
https://www.postgresql.org/docs/13/btree-implementation.html#BTREE-DEDUPLICATION
类似btree_gin但优秀, key+tid(s), 场景: 重复值较多的索引, 更新较多的场景(key->多版本tids). 参数 deduplicate_items 控制, 默认on.
Deduplication reduces the storage overhead of duplicates in indexes that
use the standard nbtree index access method. The deduplication process
is applied lazily, after the point where opportunistic deletion of
LP_DEAD-marked index tuples occurs. Deduplication is only applied at
the point where a leaf page split would otherwise be required. New
posting list tuples are formed by merging together existing duplicate
tuples. The physical representation of the items on an nbtree leaf page
is made more space efficient by deduplication, but the logical contents
of the page are not changed. Even unique indexes make use of
deduplication as a way of controlling bloat from duplicates whose TIDs
point to different versions of the same logical table row.
The lazy approach taken by nbtree has significant advantages over a GIN
style eager approach. Most individual inserts of index tuples have
exactly the same overhead as before. The extra overhead of
deduplication is amortized across insertions, just like the overhead of
page splits. The key space of indexes works in the same way as it has
since commit dd299df8 (the commit that made heap TID a tiebreaker
column).
Testing has shown that nbtree deduplication can generally make indexes
with about 10 or 15 tuples for each distinct key value about 2.5X - 4X
smaller, even with single column integer indexes (e.g., an index on a
referencing column that accompanies a foreign key). The final size of
single column nbtree indexes comes close to the final size of a similar
contrib/btree_gin index, at least in cases where GIN’s posting list
compression isn’t very effective. This can significantly improve
transaction throughput, and significantly reduce the cost of vacuuming
indexes.
A new index storage parameter (deduplicate_items) controls the use of
deduplication. The default setting is ‘on’, so all new B-Tree indexes
automatically use deduplication where possible. This decision will be
reviewed at the end of the Postgres 13 beta period.
4
Allow gin indexes to more efficiently handle NOT restrictions (Nikita Glukhov, Alexander Korotkov, Tom Lane, Julien Rouhaud)
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=4b754d6c16e16cc1a1adf12ab0f48603069a0efd
https://www.postgresql.org/message-id/flat/CAOBaU_YGP5-BEt5Cc0%3DzMve92vocPzD%2BXiZgiZs1kjY0cj%3DXBg%40mail.gmail.com
消除冗余full index scan扫描
-- master
EXPLAIN ANALYZE SELECT * FROM test WHERE LIKE '%1234%' AND t LIKE '%1%';
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on test (cost=11777.99..16421.73 rows=7999 width=32) (actual time=65.431..65.857 rows=300 loops=1)
Recheck Cond: ((t ~~ '%1234%'::text) AND (t ~~ '%1%'::text))
Rows Removed by Index Recheck: 2
Heap Blocks: exact=114
-> Bitmap Index Scan on test_t_idx (cost=0.00..11775.99 rows=7999 width=0) (actual time=65.380..65.380 rows=302 loops=1)
Index Cond: ((t ~~ '%1234%'::text) AND (t ~~ '%1%'::text))
Planning Time: 0.151 ms
Execution Time: 65.900 ms
(8 rows)
-- patched
EXPLAIN ANALYZE SELECT * FROM test WHERE t LIKE '%1234%' AND t LIKE '%1%';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on test (cost=20.43..176.79 rows=42 width=6) (actual time=0.287..0.424 rows=300 loops=1)
Recheck Cond: ((t ~~ '%1234%'::text) AND (t ~~ '%1%'::text))
Rows Removed by Index Recheck: 2
Heap Blocks: exact=114
-> Bitmap Index Scan on test_t_idx (cost=0.00..20.42 rows=42 width=0) (actual time=0.271..0.271 rows=302 loops=1)
Index Cond: ((t ~~ '%1234%'::text) AND (t ~~ '%1%'::text))
Planning Time: 0.080 ms
Execution Time: 0.450 ms
(8 rows)
create table t (a int[], b int[], c int[]);
-- 1M NULLs
insert into t select NULL, NULL, NULL
from generate_series(0, 999999) i;
-- 1M 1-element arrays
insert into t select array[i], array[i], array[i]
from generate_series(0, 999999) i;
-- 10k 2-element arrays with common element
insert into t select array[-1,i], array[-1,i], array[-1,i]
from generate_series(0, 9999) i;
create extension intarray;
create index on t using gin (a gin__int_ops, b gin__int_ops, c gin__int_ops);
| Query time, ms
WHERE condition | master | patches
| | #1 | #2 | #3 | #4
---------------------------------------+--------+------+------+------+------
a @> '{}' | 272 | 473 | 369 | 271 | 261
a @> '{}' and b @> '{}' | 374 | 548 | 523 | 368 | 353
a @> '{}' and b @> '{}' and c @> '{}' | 479 | 602 | 665 | 461 | 446
a @> '{}' and a @@ '1' | 52.2 | 0.4 | 0.4 | 0.4 | 0.4
a @> '{}' and a @@ '-1' | 56.2 | 4.0 | 4.0 | 2.3 | 2.3
a @@ '!-1' and a @@ '1' | 52.8 | 53.0 | 52.7 | 52.9 | 0.3
a @@ '!1' and a @@ '-1' | 54.9 | 55.2 | 55.1 | 55.3 | 2.4
5
E.1.3.1.3. Optimizer
Implement incremental sorting (James Coleman, Alexander Korotkov, Tomas Vondra)
If a result is already sorted by several leading keys, this allows for batch sorting of additional trailing keys because the previous keys are already equal. This is controlled by enable_incrementalsort.
c1 sorted,
order by c1,c2 use incremental sort
6
Allow hash aggregation to use disk storage for large aggregation result sets (Jeff Davis)
Previously, hash aggregation was avoided if it was expected to use more than work_mem memory. This is controlled by enable_hashagg_disk.
Allow grouping sets to use hash aggregation with disk storage for large grouping set results (Jeff Davis)
Previously, hash aggregation was avoided if it was expected to use more than work_mem memory. This is controlled by enable_hashagg_disk.
7
Allow inserts to trigger autovacuum activity (Laurenz Albe, Darafei Praliaskouski)
This new behavior reduces the work necessary when the table needs to be frozen and allows pages to be set as all-visible. All-visible pages allow index-only scans to access fewer heap rows.
8
Add maintenance_io_concurrency to control I/O concurrency for maintenance operations (Thomas Munro)
PrefetchBuffer , 异步IO, 异步读.
9
Allow skipping of WAL for “full table writes” if wal_level is minimal (Kyotaro Horiguchi)
Relations larger than wal_skip_threshold will have their files fsync’ed rather than generating WAL. Previously this was done only for COPY operations, but the implementation had a bug that could cause data loss during crash recovery.
wal_level=minimal, 重写表时, 不写wal日志.
10
Allow EXPLAIN, auto_explain, autovacuum, and pg_stat_statements to track WAL usage statistics (Kirill Bychik, Julien Rouhaud)
11
Allow a sample of statements to be logged (Adrien Nayrat)
A log_statement_sample_rate ratio of statements taking over log_min_duration_sample duration will be logged.
https://www.postgresql.org/docs/13/runtime-config-logging.html#RUNTIME-CONFIG-LOGGING-WHEN
采样执行时间超过log_min_duration_sample的sql比例.
12
Improve control of prepared statement parameter logging (Alexey Bashtanov, Álvaro Herrera)
The GUC setting log_parameter_max_length controls the maximum length of parameter values output during statement non-error logging, and log_parameter_max_length_on_error does the same for error statement logging. Previously, prepared statement parameters were not logged during errors.
绑定变量sql的参数, 日志输出. 即使sql执行错误也输出.
13
Allow function call backtraces of errors to be logged (Peter Eisentraut, Álvaro Herrera)
Server variable backtrace_functions specifies which C functions should generate backtraces on error.
14
Add system view pg_stat_progress_basebackup to report the progress of streaming base backups (Fujii Masao)
Add system view pg_stat_progress_analyze to report ANALYZE progress (Álvaro Herrera, Tatsuro Yamada, Vinayak Pokale)
15
Add columns to the pg_stat_replication system view to report how much logical decoding information has been spilled to disk (Tomas Vondra)
16
Add system view pg_shmem_allocations to display shared memory usage (Andres Freund, Robert Haas)
postgres=# select * from pg_shmem_allocations;
name | off | size | allocated_size
-------------------------------------+-----------+-----------+----------------
Buffer IO Locks | 140660096 | 524288 | 524288
Buffer Descriptors | 5393792 | 1048576 | 1048576
Async Queue Control | 147076992 | 2492 | 2560
Wal Sender Ctl | 147070464 | 1280 | 1280
AutoVacuum Data | 147062016 | 5368 | 5376
PROCLOCK hash | 143136000 | 2904 | 2944
FinishedSerializableTransactions | 146097664 | 16 | 128
XLOG Ctl | 53504 | 4208272 | 4208384
Shared MultiXact State | 5392640 | 1028 | 1152
Proc Header | 146231552 | 104 | 128
Backend Client Host Name Buffer | 146444672 | 8256 | 8320
ReplicationSlot Ctl | 147067392 | 2400 | 2432
CommitTs | 4791424 | 133568 | 133632
KnownAssignedXids | 146341888 | 31720 | 31744
Prepared Transaction Table | 146585088 | 16 | 128
BTree Vacuum State | 147074688 | 1476 | 1536
Checkpoint BufferIds | 141184384 | 327680 | 327680
Wal Receiver Ctl | 147071744 | 2248 | 2304
PREDICATELOCKTARGET hash | 143835392 | 2904 | 2944
Backend Status Array | 146381568 | 54696 | 54784
KnownAssignedXidsValid | 146373632 | 7930 | 7936
Shared Buffer Lookup Table | 141512064 | 2904 | 2944
CommitTs shared | 4925056 | 32 | 128
Backend Application Name Buffer | 146436352 | 8256 | 8320
ProcSignal | 146660352 | 8264 | 8320
Logical Replication Launcher Data | 147074048 | 424 | 512
MultiXactMember | 5259008 | 133568 | 133632
Buffer Blocks | 6442368 | 134217728 | 134217728
Proc Array | 146341248 | 528 | 640
OldSnapshotControlData | 147074560 | 68 | 128
PMSignalState | 146659328 | 1020 | 1024
PREDICATELOCK hash | 144280704 | 2904 | 2944
PredXactList | 145541120 | 88 | 128
Notify | 147079552 | 66816 | 66816
Fast Path Strong Relation Lock Data | 143831168 | 4100 | 4224
RWConflictPool | 145804672 | 24 | 128
Xact | 4262272 | 529152 | 529152
Buffer Strategy Status | 142440704 | 28 | 128
Serial | 146097792 | 133568 | 133632
SerialControlData | 146231424 | 12 | 128
shmInvalBuffer | 146589824 | 69464 | 69504
Subtrans | 4925184 | 267008 | 267008
Sync Scan Locations List | 147076224 | 656 | 768
Control File | 4261888 | 296 | 384
ReplicationOriginState | 147069824 | 568 | 640
SERIALIZABLEXID hash | 145746304 | 2904 | 2944
Backend Activity Buffer | 146452992 | 132096 | 132096
Background Worker Data | 146585216 | 4496 | 4608
MultiXactOffset | 5192192 | 66816 | 66816
LOCK hash | 142440832 | 2904 | 2944
Checkpointer Data | 146668672 | 393280 | 393344
<anonymous> | | 4855424 | 4855424
| 147275392 | 1794432 | 1794432
(53 rows)
17
Create pg_stat_slru system view to monitor internal SLRU caches (Tomas Vondra)
postgres=# select * from pg_stat_slru;
name | blks_zeroed | blks_hit | blks_read | blks_written | blks_exists | flushes | truncates | stats_reset
-----------------+-------------+----------+-----------+--------------+-------------+---------+-----------+------------------------------
CommitTs | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2020-05-28 17:24:53.03567+08
MultiXactMember | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2020-05-28 17:24:53.03567+08
MultiXactOffset | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2020-05-28 17:24:53.03567+08
Notify | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2020-05-28 17:24:53.03567+08
Serial | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2020-05-28 17:24:53.03567+08
Subtrans | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2020-05-28 17:24:53.03567+08
Xact | 0 | 126 | 0 | 0 | 0 | 0 | 0 | 2020-05-28 17:24:53.03567+08
other | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2020-05-28 17:24:53.03567+08
(8 rows)
18
Allow streaming replication configuration settings to be changed by reload (Sergei Kornilov)
Previously, a server restart was required to change primary_conninfo and primary_slot_name.
不需要重启standby , 修改上游连接和slot.
19
Allow replication slot storage to be limited by max_slot_wal_keep_size (Kyotaro Horiguchi)
Replication slots that exceed this value are invalidated.
限制slot最大保留wal 文件, 很多用户会忘记删除僵尸或不用的slot, 导致wal堆积.
20
Allow control over how much memory is used by logical decoding before it is spilled to disk (Tomas Vondra, Dilip Kumar, Amit Kapila)
This is controlled by logical_decoding_work_mem.
逻辑decode内存控制
21
Allow WAL recovery to continue even if invalid pages are referenced (Fujii Masao)
This is enabled using ignore_invalid_pages.
recovery 异常控制开关
22
Allow VACUUM to process indexes in parallel (Masahiko Sawada, Amit Kapila)
The new PARALLEL option controls this.
单个table有多个索引时, 可以并行vacuum索引
23
Allow FETCH FIRST to use WITH TIES to return any additional rows that match the last result row (Surafel Temesgen)
https://github.com/digoal/blog/blob/master/202005/20200528_01.md
帮派 limit.
24
Add polymorphic data types for use by functions requiring compatible arguments (Pavel Stehule)
The new data types are anycompatible, anycompatiblearray, anycompatiblenonarray, and anycompatiblerange.
Add SQL data type xid8 to expose FullTransactionId (Thomas Munro)
The xid data type is only four bytes so does not show the transaction epoch.
25
Allow ROW values values to have their members extracted with suffix notation (Tom Lane)
For example, (ROW(4, 5.0)).f1 returns 4.
可以少定义一些type, 例如在cte 递归中使用row返回单列(sub query约束), 可以引用row内部字段了.
26
Add function gen_random_uuid() to generate version 4 UUIDs (Peter Eisentraut)
Previously UUID generation functions were only available via external modules uuid-ossp and pgcrypto.
使用uuid函数, 终于不需要安装 uuid-ossp 插件了
27
Change the default minimum TLS version from 1.0 to 1.2 (Peter Eisentraut)
This is controlled by ssl_min_protocol_version.
安全
Add libpq connection parameters to control the min/max TLS version (Daniel Gustafsson)
The settings are ssl_min_protocol_version and ssl_max_protocol_version.
安全
28
Allow pgbench to partition its ‘accounts’ table (Fabien Coelho)
This allows performance testing of partitioning.
Allow pgbench to generate its data server-side, rather than client side (Fabien Coelho)
服务端采用generate_series写入大量数据. 不需要网络传输, 特别适合网络不好的测试环境.
29
Generate backup manifests for base backups, and verify them (Robert Haas)
A new tool pg_verifybackup can verify backups.
检测备份是否有效.
30
Have pg_basebackup estimate the total backup size by default (Fujii Masao)
This computation allows pg_stat_progress_analyze to show progress, and can be disabled by using the --no-estimate-size option. Previously, this computation happened only if --progress was used.
31
Allow reindexdb to operate in parallel (Julien Rouhaud)
Parallel mode is enabled with the new --jobs option.
允许同时创建多个index
32
Allow extensions to be specified as trusted (Tom Lane)
Such extensions can be installed in a database by users with creation rights, even if they are not superusers. This change also removes the pg_pltemplate system catalog.
例如plproxy插件, 可以直接指定为trust, 那么普通用户就能使用plproxy来创建函数.
方便管理. 在某些内部安全环境中, 便于普通用户使用一些高权限的插件, 例如pythonu
33
Add adminpack function pg_file_sync() to allow fsync’ing a file (Fujii Masao)
Add pageinspect functions to output t_infomask/t_infomask2 values in human-readable format (Craig Ringer, Sawada Masahiko, Michael Paquier)
Add btree index deduplication processing columns to pageinspect output (Peter Geoghegan)