体验PostgreSQL 16新特性(二):并发copy性能提升

数据库杂记 2023-07-02

266

前言

记得在前边一篇短文介绍里提到：PostgreSQL 16还可以使用COPY将并发批量加载数据的性能提高300%。一直想找机会试一下。再重新读了下文档，发现COPY选项本身并没有提供什么额外的命令行参数。那有可能是我个COPY同时执行的性能测定。基于这个推测，试着进行相关论证。

实作分析

看着文档里头的简要说明，还是有些不放心干脆从PG最近的一些commit历史，找找相关的changelog说明。结果从这里找到了一些相关的具体说明：

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=00d1e02be24987180115e371abaeb84738257ae2

hio: Use ExtendBufferedRelBy() to extend tables more efficiently
While we already had some form of bulk extension for relations, it was fairly
limited. It only amortized the cost of acquiring the extension lock, the
relation itself was still extended one-by-one. Bulk extension was also solely
triggered by contention, not by the amount of data inserted.
To address this, use ExtendBufferedRelBy(), introduced in 31966b151e6, to
extend the relation. We try to extend the relation by multiple blocks in two
situations:
1) The caller tells RelationGetBufferForTuple() that it will need multiple
pages. For now that's only used by heap_multi_insert(), see commit FIXME.
2) If there is contention on the extension lock, use the number of waiters for
the lock as a multiplier for the number of blocks to extend by. This is
similar to what we already did. Previously we additionally multiplied the
numbers of waiters by 20, but with the new relation extension
infrastructure I could not see a benefit in doing so.
Using the freespacemap to provide empty pages can cause significant
contention, and adds measurable overhead, even if there is no contention. To
reduce that, remember the blocks the relation was extended by in the
BulkInsertState, in the extending backend. In case 1) from above, the blocks
the extending backend needs are not entered into the FSM, as we know that we
will need those blocks.
One complication with using the FSM to record empty pages, is that we need to
insert blocks into the FSM, when we already hold a buffer content lock. To
avoid doing IO while holding a content lock, release the content lock before
recording free space. Currently that opens a small window in which another
backend could fill the block, if a concurrent VACUUM records the free
space. If that happens, we retry, similar to the already existing case when
otherBuffer is provided. In the future it might be worth closing the race by
preventing VACUUM from recording the space in newly extended pages.
This change provides very significant wins (3x at 16 clients, on my
workstation) for concurrent COPY into a single relation. Even single threaded
COPY is measurably faster, primarily due to not dirtying pages while
extending, if supported by the operating system (see commit 4d330a61bb1). Even
single-row INSERTs benefit, although to a much smaller degree, as the relation
extension lock rarely is the primary bottleneck.
Reviewed-by: Melanie Plageman melanieplageman@gmail.com
Discussion: https://postgr.es/m/20221029025420.eplyow6k7tgu6he3@awork3.anarazel.de

这一变化提供了非常明显的胜利(16个客户端的3倍)工作站)用于并发复制到单个关系中。单线程COPY也有可能快一些，这主要是由于它不会弄脏页面扩展，如果操作系统支持的话(参见提交4d330a61bb1)。甚至单行insert作为关系受益，尽管程度要小得多扩展锁很少是主要瓶颈。

有了这个简要的说明，我们可以做一些简要的测试了。就拿14.8跟16.0Beta进行简要对比，看看。为了简化，archive_mode关闭。测试机器配置是8GRAM. cpuinfo如下：

 1[10:21:07-postgres@sean-rh1:/var/lib/pgsql/16/data]$ cat /proc/cpuinfo
 2processor       : 0
 3vendor_id       : GenuineIntel
 4cpu family      : 6
 5model           : 85
 6model name      : Intel(R) Xeon(R) Platinum 8260 CPU @ 2.40GHz
 7stepping        : 7
 8microcode       : 0x500320a
 9cpu MHz         : 2394.375
10....
11processor       : 3
12....

是4核，Intel(R) Xeon(R) Platinum 8260 CPU @ 2.40GHz。

场景1: 单线程copy

PostgreSQL 14.8下边：建表t及准备数据

 1mydb=# select version();
 2                                                 version
 3 PostgreSQL 14.8 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit
 4(1 row)
 5
 6mydb=# create table t(id int, col2 varchar(32));
 7CREATE TABLE
 8mydb=# \timing on
 9Timing is on.
10mydb=# insert into t select n, 'test' || n from generate_series(1, 10000000) as n;
11INSERT 0 10000000
12Time: 9690.842 ms (00:09.691)

我们看到大概9.7秒，插入1000万条记录。

再把它copy出来，truncate掉，copy进去，看看分别花了多长时间。

 1mydb=# \copy t to '/iihero/tmp/t.dat';
 2COPY 10000000
 3Time: 2242.262 ms (00:02.242)
 4
 5mydb=# copy t from '/iihero/tmp/t.dat';
 6COPY 10000000
 7Time: 4385.167 ms (00:04.385)
 8
 9mydb=# drop table t;
10DROP TABLE
11Time: 80.719 ms
12mydb=# create table t(id int, col2 varchar(32));
13CREATE TABLE
14Time: 1.703 ms
15mydb=# copy t from '/iihero/tmp/t.dat';
16COPY 10000000
17Time: 4435.175 ms (00:04.435)

我们看到copy进来的速度，大概也就是4.4秒左右的时间。

PostgreSQL 16下：

 1[10:22:26-postgres@sean-rh1:/var/lib/pgsql/16/data]$ psql
 2psql (16beta1 [By iihero])
 3Type "help" for help.
 4
 5postgres=# create table t(id int, col2 varchar(32));
 6CREATE TABLE
 7postgres=# \timing on
 8Timing is on.
 9postgres=# copy t from '/iihero/tmp/t.dat';
10COPY 10000000
11Time: 4446.794 ms (00:04.447)

看看时间消耗，大概4.4秒左右时间。这个比较起来，似乎没有明显的差距。

场景2：并发copy

先准备一份20000条记录的数据集.

1mydb=# \copy t to '/iihero/tmp/t20k.dat';
2COPY 20000
3Time: 4.282 ms

PostgreSQL14.8下的16 client并发：

我们看看pgbench去驱动16个client, 每个client跑一下脚本copy.sql，跑个总时间60秒试试？

1\copy t from '/iihero/tmp/t20k.dat';

1pgbench -U postgres -c 16 -f copy.sql -T 60

发现一个小悲剧：

1[11:06:42-postgres@sean-rh1:/var/lib/pgsql]$ pgbench -U postgres -f perf.sql
2pgbench: fatal: perf.sql:2: invalid command in command "copy"
3\copy t(id,col2) from '/iihero/tmp/t20k.dat';

pgbench不能执行自定义脚本中的\copy语句。或许是我眼花，用错了？

不想在这个问题上纠结，用python简单模拟了一个，就是不断调用psql去load，执行一分钟总时长。

1数据重新准备：
2postgres=# truncate t;
3TRUNCATE TABLE
4postgres=# insert into t select n, 'test' || n from generate_series(1, 1000) as n;
5INSERT 0 1000
6postgres=# \copy t to '/iihero/tmp/t1k.dat';
7COPY 1000

python3的脚本, 跑10个并发，去用\copy 来加载数据，每次加载1000条。执行1分钟总时长。

 1import subprocess
 2import threading
 3import time
 4import sys
 5
 6# Define a function that runs the psql command in a loop
 7def run_psql_command():
 8    start_time = time.time()
 9    while time.time() - start_time < 60:  # Run for about 1 minute
10        command = "psql -U postgres -c '\copy t(id,col2) from \'/iihero/tmp/t1k.dat\';'"
11        subprocess.check_call(
12                        command,
13                        stdout=sys.stdout,
14                        stderr=sys.stdout,
15                        shell=True)
16        time.sleep(5/1000)
17
18# Launch 16 threads that execute the psql command in a loop
19threads = []
20for i in range(10):
21    thread = threading.Thread(target=run_psql_command)
22    thread.start()
23    threads.append(thread)
24
25# Wait for all threads to finish
26for thread in threads:
27    thread.join()

1[11:49:17-postgres@sean-rh1:/var/lib/pgsql/script/python]$ python3.6 perf.py
2执行完以后，我们看看加载总量：
3
4[11:51:17-postgres@sean-rh1:/var/lib/pgsql/script/python]$ psql -c "select count(*) from t"
5  count
6----------
7 20798000
8(1 row)

我们看到了，总插入量是: 2079万8000条。

好，同样的过程，我们在PostgreSQL 16上执行一下：

 1[11:53:25-postgres@sean-rh1:/var/lib/pgsql/script/python]$ which psql
 2/usr/pgsql-16build/bin/psql
 3
 4[11:53:40-postgres@sean-rh1:/var/lib/pgsql/script/python]$ python3.6 perf.py
 5
 6[11:53:51-postgres@sean-rh1:/var/lib/pgsql/script/python]$  psql -c "select count(*) from t"
 7  count
 8----------
 9 23798000
10(1 row)

总计插入总量是2379万8000条。比PG14来说，是有些提高，但是并没有达到300%之多。

补充验证测试

本着怀疑的精神，我将并发的10个线程，改为1个线程，串行执行1分钟，看看结果：

1[11:57:22-postgres@sean-rh1:/var/lib/pgsql/script/python]$ psql -c "select count(*) from t"
2  count
3---------
4 3747000
5(1 row)

嗯，这个374万，与有边的2079万，还是有差距的。也进一步说明，多个\copy还是可以提高性能的。

前边的代码里头有sleep (5ms)，这次我去掉sleep. 采用16线程加载，再比较：

PG14:

1[12:01:18-postgres@sean-rh1:/var/lib/pgsql/script/python]$ psql -c "select count(*) from t"
2  count
3----------
4 20437000
5(1 row)

16个线程，不sleep，反而也只有2043.7万条入库。

看看PG16:

1[12:03:39-postgres@sean-rh1:/var/lib/pgsql/script/python]$  psql -c "select count(*) from t"
2  count
3----------
4 22908000
5(1 row)

比PG14要好一些：2290.8万条。

单线程，不sleep。

PG14:

1[11:53:40-postgres@sean-rh1:/var/lib/pgsql/script/python]$ python3.6 perf.py
2
3[12:06:17-postgres@sean-rh1:/var/lib/pgsql/script/python]$ psql -c "select count(*) from t"
4  count
5---------
6 5520000
7(1 row)

PG16:

1[12:04:09-postgres@sean-rh1:/var/lib/pgsql/script/python]$ psql -c "truncate t"
2TRUNCATE TABLE
3[12:07:26-postgres@sean-rh1:/var/lib/pgsql/script/python]$ python3.6 perf.py
4[12:08:38-postgres@sean-rh1:/var/lib/pgsql/script/python]$  psql -c "select count(*) from t"
5  count
6---------
7 6151000
8(1 row)

小结：

PG16在copy的性能上确实有改善。300%的提升也许是在一些特定的条件下。如果是多个并发相对于单进程的话，确实可以达到，甚至可以达到更多的倍数。如果是跟PG14都在多个进程进行并发访问，虽有提高，但据目前的测试验证来看，并没有提升那么多。

单个线程：1分钟copy load, 每个load 1000条记录，PG16可以达到6151000条，PG14为5520000条。

16个并发：1分钟copy load, 每个load 1000条记录，PG16可以达到22908000条，PG14为20437000条。

两种情形：PG16都比PG14要好一些。同时我们也可以看到, 加大并发时，确实可以让性能比单个线程超出3倍以上。

对于文档中描述的情形，目前我只能从这些直观的测试中得出目前这些结论。

另外一个让人稍觉不舒服的地方是，pgbench居然不支持 \copy语句在自定义sql文件中执行。这个有点悲剧了。我本不想自己单独写一个程序去模拟多线程并发的。或许有人有办法让pgbench能让它执行，可以留言给我。

postgresql python

文章转载自数据库杂记，如果涉嫌侵权，请发送邮件至：contact@modb.pro进行举报，并提供相关证据，一经查实，墨天轮将立刻删除相关内容。

体验PostgreSQL 16新特性(二):并发copy性能提升

前言

实作分析

场景1: 单线程copy

场景2：并发copy

补充验证测试

小结：

评论