背景
视频回放: https://www.bilibili.com/video/BV1Yb4y1879P/
本文描述在共享存储+多个计算节点的环境中PolarDB for PostgreSQL开源版本的部署实践.
其中共享存储使用的是NBD设备, 这种环境比较容易获取, 你只要有服务器就可以搭建, 降低了学习PolarDB for PG的门槛. 但是不建议生产环境使用, 如果生产环境建议使用商业的SAN存储或者是商业的分布式共享块存储. 搭建方法与本文类似, 跳过NBD的部署使用其他共享块设备替代NBD即可.
环境:
8c32G ecs 共享存储 , 3块网络共享盘 1台 172.25.9.67 ecs rw 1台 172.25.9.68 ecs ro 2台 172.25.9.69 172.25.9.70
复制
PS: ECS需要能连接公网, 安装过程有一些获取cpan, yum等安装操作.
PS: PolarDB for PostgreSQL 开源地址如下: https://github.com/ApsaraDB
1、部署os (所有ECS)
参考 《network block device(nbd) 共享网络块设备 - 用于测试RAC和PolarDB for PG共享存储版》
2、部署nbd软件 (所有ECS)
参考 《network block device(nbd) 共享网络块设备 - 用于测试RAC和PolarDB for PG共享存储版》
3、export nbd 共享块设备 (共享存储ECS)
参考 《network block device(nbd) 共享网络块设备 - 用于测试RAC和PolarDB for PG共享存储版》
查看用于nbd的块设备如下, 有vdb, vdc, vdd 3块盘用于nbd
[root@iZbp10sz66ubwpqzg2ry5gZ ~]# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT vda 253:0 0 100G 0 disk └─vda1 253:1 0 100G 0 part / vdb 253:16 0 100G 0 disk vdc 253:32 0 100G 0 disk vdd 253:48 0 100G 0 disk
复制
编写nbd server配置文件, 注意nbd.conf正常配置的行末尾绝对不要存在空格, 否则会启动失败.
# vi /root/nbd.conf # This is a comment [generic] # The [generic] section is required, even if nothing is specified # there. # When either of these options are specified, nbd-server drops # privileges to the given user and group after opening ports, but # _before_ opening files. # user = nbd # group = nbd listenaddr = 0.0.0.0 port = 1921 [export1] exportname = /dev/vdb readonly = false multifile = false copyonwrite = false flush = true fua = true sync = true [export2] exportname = /dev/vdc readonly = false multifile = false copyonwrite = false flush = true fua = true sync = true [export3] exportname = /dev/vdd readonly = false multifile = false copyonwrite = false flush = true fua = true sync = true
复制
启动nbd-server
# nbd-server -C /root/nbd.conf
复制
# netstat -anp|grep 1921 tcp 0 0 0.0.0.0:1921 0.0.0.0:* LISTEN 1296/nbd-server
复制
4、挂载nbd共享块设备 (rw, ro ECS)
参考 《network block device(nbd) 共享网络块设备 - 用于测试RAC和PolarDB for PG共享存储版》
重启服务器后, 必须modprobe nbd加载模块, 再挂载nbd
nbd-client 172.25.9.67 1921 -N export1 /dev/nbd0 nbd-client 172.25.9.67 1921 -N export2 /dev/nbd1 nbd-client 172.25.9.67 1921 -N export3 /dev/nbd2
复制
[root@iZbp13tgwor95f2508zo4oZ ~]# nbd-client 172.25.9.67 1921 -N export1 /dev/nbd0 Negotiation: ..size = 102400MB bs=1024, sz=107374182400 bytes [root@iZbp13tgwor95f2508zo4oZ ~]# nbd-client 172.25.9.67 1921 -N export2 /dev/nbd1 Negotiation: ..size = 102400MB bs=1024, sz=107374182400 bytes [root@iZbp13tgwor95f2508zo4oZ ~]# nbd-client 172.25.9.67 1921 -N export3 /dev/nbd2 Negotiation: ..size = 102400MB bs=1024, sz=107374182400 bytes
复制
vi /etc/rc.local nbd-client 172.25.9.67 1921 -N export1 /dev/nbd0 nbd-client 172.25.9.67 1921 -N export2 /dev/nbd1 nbd-client 172.25.9.67 1921 -N export3 /dev/nbd2 #nbd-client -d /dev/nbd0 #nbd-client -d /dev/nbd1 #nbd-client -d /dev/nbd2
复制
5、安装pfs软件 (rw, ro ECS)
https://github.com/ApsaraDB/PolarDB-FileSystem/blob/master/Readme-CN.md
yum install -y cmake3 gcc gcc-c++ libaio-devel git unzip ln -s /usr/bin/cmake3 /usr/bin/cmake
复制
https://github.com/HardySimpson/zlog/releases
wget https://github.com/HardySimpson/zlog/archive/refs/tags/1.2.15.tar.gz tar -zxvf 1.2.15.tar.gz cd zlog-1.2.15/ make make install echo "/usr/local/lib" >> /etc/ld.so.conf ldconfig ldconfig -v|grep zlog ... ... libzlog.so.1.2 -> libzlog.so.1.2
复制
su - root cd ~ # 注意分支, 建议使用stable分支或者release分支, 具体请参考对应文档说明. git clone https://github.com/ApsaraDB/PolarDB-FileSystem cd PolarDB-FileSystem ./autobuild.sh ... ... [100%] Linking CXX executable ../../../bin/pfsdaemon [100%] Built target pfsdaemon ~/PolarDB-FileSystem-master end compile, binary's in ./bin, library's in ./lib ./install.sh install pfsd success!
复制
6、配置pfs (rw, ro ECS)
不同块设备可以创建不同的路径, 同一个块设备里面也可以创建不同的目录. 通过这种方式将块设备映射到目录, 从而映射到数据库的表空间.
块设备重命名
PFS仅支持特定字符开头的块设备进行访问,建议所有块设备访问节点都通过软链接使用相同名字访问共享块设备。
NBD客户端主机上执行:
ln -s /dev/nbd0 /dev/nvme0n1 ln -s /dev/nbd1 /dev/nvme0n2 ln -s /dev/nbd2 /dev/nvme0n3
复制
块设备初始化
注意: 只在RW节点执行PFS操作来格式化共享块设备即可:
pfs -C disk mkfs nvme0n1 pfs -C disk mkfs nvme0n2 pfs -C disk mkfs nvme0n3 ... pfs mkfs succeeds!
复制
块设备挂载
在RW, RO节点上,分别启动PFS,挂载共享盘:
/usr/local/polarstore/pfsd/bin/start_pfsd.sh -p nvme0n1 /usr/local/polarstore/pfsd/bin/start_pfsd.sh -p nvme0n2 /usr/local/polarstore/pfsd/bin/start_pfsd.sh -p nvme0n3 ... ... pfsdaemon nvme0n1 start success pfsdaemon nvme0n2 start success pfsdaemon nvme0n3 start success
复制
vi /etc/rc.local /usr/local/polarstore/pfsd/bin/start_pfsd.sh -p nvme0n1 /usr/local/polarstore/pfsd/bin/start_pfsd.sh -p nvme0n2 /usr/local/polarstore/pfsd/bin/start_pfsd.sh -p nvme0n3 # /usr/local/polarstore/pfsd/bin/stop_pfsd.sh nvme0n1 # /usr/local/polarstore/pfsd/bin/stop_pfsd.sh nvme0n2 # /usr/local/polarstore/pfsd/bin/stop_pfsd.sh nvme0n3
复制
7、安装polardb软件 (rw, ro ECS)
《PolarDB for PostgreSQL 开源版 - 计算存储分离版(类似Oracle RAC架构) 部署指南》
su - root yum install -y readline-devel zlib-devel perl-CPAN bison flex git cpan -fi Test::More IPC::Run
复制
useradd polardb su - polardb cd ~ wget https://github.com/ApsaraDB/PolarDB-for-PostgreSQL/archive/refs/heads/POLARDB_11_STABLE.zip unzip POLARDB_11_STABLE.zip cd PolarDB-for-PostgreSQL-POLARDB_11_STABLE ./polardb_build.sh --noinit --with-pfsd
复制
8、初始化rw节点 (rw ECS)
su - polardb cd ~ $HOME/tmp_basedir_polardb_pg_1100_bld/bin/initdb -D primary -E UTF8 --lc-collate=C --lc-ctype=en_US.utf8 -U polardb # 共享存储初始化 su - root pfs -C disk mkdir /nvme0n1/shared_data # 这一这一步也是用root执行 /home/polardb/tmp_basedir_polardb_pg_1100_bld/bin/polar-initdb.sh /home/polardb/primary/ /nvme0n1/shared_data/
复制
节点配置
su - polardb cd ~/primary # 注意
复制
打开postgresql.conf,增加以下配置项:
listen_addresses = '0.0.0.0' port = 5432 max_connections = 1000 unix_socket_directories = '., /tmp' tcp_keepalives_idle = 60 tcp_keepalives_interval = 10 tcp_keepalives_count = 10 shared_buffers = 8GB maintenance_work_mem = 1GB dynamic_shared_memory_type = posix parallel_leader_participation = off random_page_cost = 1.1 log_destination = 'csvlog' logging_collector = on log_directory = 'log' log_truncate_on_rotation = on log_checkpoints = on log_connections = on log_disconnections = on log_error_verbosity = verbose log_lock_waits = on log_statement = 'ddl' log_timezone = 'Asia/Shanghai' log_autovacuum_min_duration = 0 autovacuum_vacuum_cost_delay = 0ms datestyle = 'iso, mdy' timezone = 'Asia/Shanghai' lc_messages = 'en_US.UTF-8' lc_monetary = 'en_US.UTF-8' lc_numeric = 'en_US.UTF-8' lc_time = 'en_US.UTF-8' default_text_search_config = 'pg_catalog.english' polar_hostid=1 # 注意 polar_enable_shared_storage_mode=on polar_disk_name='nvme0n1' # 注意 polar_datadir='/nvme0n1/shared_data/' # 注意 polar_vfs.localfs_mode=off shared_preload_libraries='$libdir/polar_vfs,$libdir/polar_worker' polar_storage_cluster_name='disk' synchronous_standby_names='replica1, replica2' # 注意
复制
打开pg_hba.conf,增加以下配置项:
host replication polardb 172.25.9.68/32 trust host replication polardb 172.25.9.69/32 trust host replication polardb 172.25.9.70/32 trust host all all 0.0.0.0/0 md5
复制
9、启动rw (rw ECS)
启动与检查
su - polardb
复制
启动
$HOME/tmp_basedir_polardb_pg_1100_bld/bin/pg_ctl start -D $HOME/primary
复制
检查
$HOME/tmp_basedir_polardb_pg_1100_bld/bin/psql -p 5432 -d postgres -c 'select version();' version --------------------------------------------------------------------------------------------------------- PostgreSQL 11.9 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit (1 row)
复制
配置环境变量, 方便使用:
su - polardb vi ~/.bashrc # 追加 export PS1="$USER@`/bin/hostname -s`-> " export PGPORT=5432 export PGDATA=/home/polardb/primary export LANG=en_US.utf8 export PGHOME=/home/polardb/tmp_basedir_polardb_pg_1100_bld export LD_LIBRARY_PATH=$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib:$LD_LIBRARY_PATH export DATE=`date +"%Y%m%d%H%M"` export PATH=$PGHOME/bin:$PATH:. export MANPATH=$PGHOME/share/man:$MANPATH export PGHOST=127.0.0.1 export PGUSER=polardb export PGDATABASE=postgres alias rm='rm -i' alias ll='ls -lh' unalias vi
复制
10、只读节点的流复制准备 (rw ECS)
创建相应的replication slot,用于接下来创建的只读节点的物理流复制
su - polardb $HOME/tmp_basedir_polardb_pg_1100_bld/bin/psql -p 5432 -d postgres -c "select pg_create_physical_replication_slot('replica1');" $HOME/tmp_basedir_polardb_pg_1100_bld/bin/psql -p 5432 -d postgres -c "select pg_create_physical_replication_slot('replica2');"
复制
11、创建表空间(对应pfs不同路径) (rw ECS)
目前不支持多个块设备映射到不同的表空间.
所以如果你有多个盘怎么利用起来? 可以使用lvm2逻辑卷来进行管理. 在nbd server配置即可, 做成1个大的卷
pvcreate vgcreate lvcreate lvextend
复制
这里不展开了, 可以参考:
《PostgreSQL 11 1万亿 tpcb 性能测试 on 阿里云ECS + ESSD + zfs/lvm2条带 + block_size=32K》
其他, 使用PFS可以查看初始化后在共享存储中的内容:
[root@iZbp13tgwor95f2508zo4oZ ~]# pfs -C disk ls /nvme0n1/ File 1 4194304 Mon Nov 8 14:21:58 2021 .pfs-paxos File 1 1073741824 Mon Nov 8 14:22:03 2021 .pfs-journal Dir 1 1280 Mon Nov 8 15:23:05 2021 shared_data total 2105344 (unit: 512Bytes) [root@iZbp13tgwor95f2508zo4oZ ~]# pfs -C disk ls /nvme0n1/shared_data Dir 1 512 Mon Nov 8 15:11:38 2021 base Dir 1 7424 Mon Nov 8 15:11:39 2021 global Dir 1 0 Mon Nov 8 15:11:39 2021 pg_tblspc Dir 1 10368 Mon Nov 8 18:06:41 2021 pg_wal Dir 1 896 Mon Nov 8 18:06:41 2021 pg_logindex Dir 1 0 Mon Nov 8 15:11:39 2021 pg_twophase Dir 1 512 Mon Nov 8 17:57:42 2021 pg_xact Dir 1 0 Mon Nov 8 15:11:40 2021 pg_commit_ts Dir 1 256 Mon Nov 8 15:11:40 2021 pg_multixact Dir 1 512 Mon Nov 8 15:23:18 2021 polar_fullpage total 0 (unit: 512Bytes)
复制
12、初始化ro节点 (ro ECS)
RO 1:
su - polardb
复制
节点初始化
cd ~ $HOME/tmp_basedir_polardb_pg_1100_bld/bin/initdb -D replica1 # 注意
复制
节点配置
cd ~/replica1 # 注意
复制
打开postgresql.conf,增加以下配置项:
listen_addresses = '0.0.0.0' port = 5432 max_connections = 1000 unix_socket_directories = '., /tmp' tcp_keepalives_idle = 60 tcp_keepalives_interval = 10 tcp_keepalives_count = 10 shared_buffers = 8GB maintenance_work_mem = 1GB dynamic_shared_memory_type = posix parallel_leader_participation = off random_page_cost = 1.1 log_destination = 'csvlog' logging_collector = on log_directory = 'log' log_truncate_on_rotation = on log_checkpoints = on log_connections = on log_disconnections = on log_error_verbosity = verbose log_lock_waits = on log_statement = 'ddl' log_timezone = 'Asia/Shanghai' log_autovacuum_min_duration = 0 autovacuum_vacuum_cost_delay = 0ms datestyle = 'iso, mdy' timezone = 'Asia/Shanghai' lc_messages = 'en_US.UTF-8' lc_monetary = 'en_US.UTF-8' lc_numeric = 'en_US.UTF-8' lc_time = 'en_US.UTF-8' default_text_search_config = 'pg_catalog.english' polar_hostid=2 # 注意 polar_enable_shared_storage_mode=on polar_disk_name='nvme0n1' # 注意 polar_datadir='/nvme0n1/shared_data/' # 注意 polar_vfs.localfs_mode=off shared_preload_libraries='$libdir/polar_vfs,$libdir/polar_worker' polar_storage_cluster_name='disk'
复制
打开pg_hba.conf,增加以下配置项:
host replication polardb 172.25.9.68/32 trust host replication polardb 172.25.9.69/32 trust host replication polardb 172.25.9.70/32 trust host all all 0.0.0.0/0 md5
复制
创建recovery.conf,增加以下配置项:
polar_replica='on' recovery_target_timeline='latest' primary_slot_name='replica1' # 注意 primary_conninfo='host=172.25.9.68 port=5432 user=polardb dbname=postgres application_name=replica1' # 注意
复制
配置环境变量, 方便使用:
su - polardb vi ~/.bashrc # 追加 export PS1="$USER@`/bin/hostname -s`-> " export PGPORT=5432 export PGDATA=/home/polardb/replica1 # 注意 export LANG=en_US.utf8 export PGHOME=/home/polardb/tmp_basedir_polardb_pg_1100_bld export LD_LIBRARY_PATH=$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib:$LD_LIBRARY_PATH export DATE=`date +"%Y%m%d%H%M"` export PATH=$PGHOME/bin:$PATH:. export MANPATH=$PGHOME/share/man:$MANPATH export PGHOST=127.0.0.1 export PGUSER=polardb export PGDATABASE=postgres alias rm='rm -i' alias ll='ls -lh' unalias vi
复制
RO 2:
节点初始化
$HOME/tmp_basedir_polardb_pg_1100_bld/bin/initdb -D replica2 # 注意
复制
节点配置
cd ~/replica2 # 注意
复制
打开postgresql.conf,增加以下配置项:
listen_addresses = '0.0.0.0' port = 5432 max_connections = 1000 unix_socket_directories = '., /tmp' tcp_keepalives_idle = 60 tcp_keepalives_interval = 10 tcp_keepalives_count = 10 shared_buffers = 8GB maintenance_work_mem = 1GB dynamic_shared_memory_type = posix parallel_leader_participation = off random_page_cost = 1.1 log_destination = 'csvlog' logging_collector = on log_directory = 'log' log_truncate_on_rotation = on log_checkpoints = on log_connections = on log_disconnections = on log_error_verbosity = verbose log_lock_waits = on log_statement = 'ddl' log_timezone = 'Asia/Shanghai' log_autovacuum_min_duration = 0 autovacuum_vacuum_cost_delay = 0ms datestyle = 'iso, mdy' timezone = 'Asia/Shanghai' lc_messages = 'en_US.UTF-8' lc_monetary = 'en_US.UTF-8' lc_numeric = 'en_US.UTF-8' lc_time = 'en_US.UTF-8' default_text_search_config = 'pg_catalog.english' polar_hostid=3 # 注意 polar_enable_shared_storage_mode=on polar_disk_name='nvme0n1' # 注意 polar_datadir='/nvme0n1/shared_data/' # 注意 polar_vfs.localfs_mode=off shared_preload_libraries='$libdir/polar_vfs,$libdir/polar_worker' polar_storage_cluster_name='disk'
复制
打开pg_hba.conf,增加以下配置项:
host replication polardb 172.25.9.68/32 trust host replication polardb 172.25.9.69/32 trust host replication polardb 172.25.9.70/32 trust host all all 0.0.0.0/0 md5
复制
创建recovery.conf,增加以下配置项:
polar_replica='on' recovery_target_timeline='latest' primary_slot_name='replica2' # 注意 primary_conninfo='host=172.25.9.68 port=5432 user=polardb dbname=postgres application_name=replica2' # 注意
复制
配置环境变量, 方便使用:
su - polardb vi ~/.bashrc # 追加 export PS1="$USER@`/bin/hostname -s`-> " export PGPORT=5432 export PGDATA=/home/polardb/replica2 # 注意 export LANG=en_US.utf8 export PGHOME=/home/polardb/tmp_basedir_polardb_pg_1100_bld export LD_LIBRARY_PATH=$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib:$LD_LIBRARY_PATH export DATE=`date +"%Y%m%d%H%M"` export PATH=$PGHOME/bin:$PATH:. export MANPATH=$PGHOME/share/man:$MANPATH export PGHOST=127.0.0.1 export PGUSER=polardb export PGDATABASE=postgres alias rm='rm -i' alias ll='ls -lh' unalias vi
复制
13、启动ro节点 (ro ECS)
RO 1:
启动与检查
su - polardb
复制
启动
$HOME/tmp_basedir_polardb_pg_1100_bld/bin/pg_ctl start -D $HOME/replica1
复制
检查
$HOME/tmp_basedir_polardb_pg_1100_bld/bin/psql -p 5432 -d postgres -c 'select version();'
复制
RO 2:
启动与检查
su - polardb
复制
启动
$HOME/tmp_basedir_polardb_pg_1100_bld/bin/pg_ctl start -D $HOME/replica2
复制
检查
$HOME/tmp_basedir_polardb_pg_1100_bld/bin/psql -p 5432 -d postgres -c 'select version();'
复制
14、数据库使用测试 (rw, ro ECS)
实例检查和测试
部署完成后,需要进行实例检查和测试,确保主节点可正常写入数据、只读节点可以正常读取。
登录RW主节点,创建测试表并插入样例数据:
$HOME/tmp_basedir_polardb_pg_1100_bld/bin/psql -q -p 5432 -d postgres -c "create table t(t1 int primary key, t2 int);insert into t values (1, 1),(2, 3),(3, 3);"
复制
登录任意RO只读节点,查询刚刚插入的样例数据:
$HOME/tmp_basedir_polardb_pg_1100_bld/bin/psql -q -p 5432 -d postgres -c "select * from t;" t1 | t2 ----+---- 1 | 1 2 | 3 3 | 3 (3 rows)
复制
15、其他检查
rw 检查复制状态:
postgres=# select * from pg_stat_replication ; -[ RECORD 1 ]----+------------------------------ pid | 17361 usesysid | 10 usename | polardb application_name | replica1 client_addr | 172.25.9.69 client_hostname | client_port | 56684 backend_start | 2021-11-08 15:34:04.711213+08 backend_xmin | state | streaming sent_lsn | 0/166A090 write_lsn | 0/166A090 flush_lsn | 0/166A090 replay_lsn | 0/166A090 write_lag | flush_lag | replay_lag | sync_priority | 1 sync_state | sync -[ RECORD 2 ]----+------------------------------ pid | 17363 usesysid | 10 usename | polardb application_name | replica2 client_addr | 172.25.9.70 client_hostname | client_port | 45858 backend_start | 2021-11-08 15:34:30.442495+08 backend_xmin | state | streaming sent_lsn | 0/166A090 write_lsn | 0/166A090 flush_lsn | 0/166A090 replay_lsn | 0/166A090 write_lag | flush_lag | replay_lag | sync_priority | 2 sync_state | potential
复制
postgres=# select * from pg_replication_slots ; -[ RECORD 1 ]-------+---------- slot_name | replica1 plugin | slot_type | physical datoid | database | temporary | f active | t active_pid | 17361 xmin | catalog_xmin | restart_lsn | 0/1669C78 confirmed_flush_lsn | -[ RECORD 2 ]-------+---------- slot_name | replica2 plugin | slot_type | physical datoid | database | temporary | f active | t active_pid | 17363 xmin | catalog_xmin | restart_lsn | 0/1669C78 confirmed_flush_lsn |
复制
压测(优化前):
pgbench -i -s 100
复制
只读压测可以所有节点同时开压力
pgbench -M prepared -n -r -P 1 -c 32 -j 32 -T 120 -S
复制
读写压测在RW节点执行
pgbench -M prepared -n -r -P 1 -c 8 -j 8 -T 120
复制
查询吞吐基本上是与节点数增加呈现线性提升(因为数据都在内存中).
压测过程
RW节点:
读写压测
pgbench -M prepared -n -r -P 1 -c 8 -j 8 -T 1200 progress: 192.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 193.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 194.0 s, 559.0 tps, lat 58.408 ms stddev 456.270 progress: 195.0 s, 1616.7 tps, lat 4.959 ms stddev 18.420 progress: 196.0 s, 2153.2 tps, lat 3.709 ms stddev 1.102 progress: 197.0 s, 646.0 tps, lat 3.635 ms stddev 1.042 progress: 198.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 199.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 200.0 s, 283.1 tps, lat 104.779 ms stddev 595.861 progress: 201.0 s, 2214.0 tps, lat 3.620 ms stddev 1.123 progress: 202.0 s, 2153.0 tps, lat 3.709 ms stddev 1.096 progress: 203.0 s, 2377.8 tps, lat 3.369 ms stddev 0.977 progress: 204.0 s, 2313.3 tps, lat 3.460 ms stddev 0.987 progress: 205.0 s, 2329.9 tps, lat 3.429 ms stddev 0.976 progress: 206.0 s, 2283.7 tps, lat 3.508 ms stddev 1.052 progress: 207.0 s, 2098.3 tps, lat 3.809 ms stddev 1.895 progress: 208.0 s, 2340.1 tps, lat 3.417 ms stddev 0.977 progress: 209.0 s, 2052.9 tps, lat 3.902 ms stddev 6.712 progress: 210.0 s, 2346.1 tps, lat 3.408 ms stddev 0.965 progress: 211.0 s, 2316.9 tps, lat 3.452 ms stddev 0.994
复制
RO 节点1,2:
只读压测
pgbench -M prepared -n -r -P 1 -c 32 -j 32 -T 1200 -S progress: 254.0 s, 30921.5 tps, lat 1.034 ms stddev 0.892 progress: 255.0 s, 32351.0 tps, lat 0.990 ms stddev 1.660 progress: 256.0 s, 33540.0 tps, lat 0.953 ms stddev 0.859 progress: 257.0 s, 33027.0 tps, lat 0.971 ms stddev 1.040 progress: 258.0 s, 32791.1 tps, lat 0.976 ms stddev 0.631 progress: 259.0 s, 32839.6 tps, lat 0.975 ms stddev 1.837 progress: 260.0 s, 33539.4 tps, lat 0.954 ms stddev 0.527 progress: 261.0 s, 34344.5 tps, lat 0.932 ms stddev 0.984 progress: 262.0 s, 32383.9 tps, lat 0.988 ms stddev 0.618 progress: 263.0 s, 33186.0 tps, lat 0.964 ms stddev 0.512 progress: 264.0 s, 33253.3 tps, lat 0.962 ms stddev 0.497 progress: 265.0 s, 32584.0 tps, lat 0.982 ms stddev 0.466 progress: 266.0 s, 32959.8 tps, lat 0.967 ms stddev 1.310 progress: 267.0 s, 32392.0 tps, lat 0.991 ms stddev 0.701 progress: 268.0 s, 33307.8 tps, lat 0.961 ms stddev 0.505 progress: 269.0 s, 33255.8 tps, lat 0.962 ms stddev 0.589 progress: 270.0 s, 33994.6 tps, lat 0.941 ms stddev 0.449 progress: 271.0 s, 34127.2 tps, lat 0.937 ms stddev 1.441
复制
pgbench -M prepared -n -r -P 1 -c 32 -j 32 -T 1200 -S progress: 253.0 s, 32289.2 tps, lat 0.991 ms stddev 0.621 progress: 254.0 s, 32778.7 tps, lat 0.976 ms stddev 0.658 progress: 255.0 s, 32314.9 tps, lat 0.984 ms stddev 1.320 progress: 256.0 s, 31984.6 tps, lat 1.006 ms stddev 1.159 progress: 257.0 s, 31889.4 tps, lat 1.004 ms stddev 0.679 progress: 258.0 s, 32466.4 tps, lat 0.986 ms stddev 0.565 progress: 259.0 s, 33572.2 tps, lat 0.953 ms stddev 0.517 progress: 260.0 s, 33482.7 tps, lat 0.956 ms stddev 0.532 progress: 261.0 s, 33047.4 tps, lat 0.968 ms stddev 0.487 progress: 262.0 s, 32951.0 tps, lat 0.971 ms stddev 0.634 progress: 263.0 s, 34039.1 tps, lat 0.940 ms stddev 1.068 progress: 264.0 s, 33919.8 tps, lat 0.944 ms stddev 0.936 progress: 265.0 s, 34062.7 tps, lat 0.940 ms stddev 0.648 progress: 266.0 s, 31726.4 tps, lat 1.009 ms stddev 0.567 progress: 267.0 s, 34335.7 tps, lat 0.932 ms stddev 1.252 progress: 268.0 s, 33604.4 tps, lat 0.952 ms stddev 0.571 progress: 269.0 s, 34043.3 tps, lat 0.940 ms stddev 0.673 progress: 270.0 s, 33909.3 tps, lat 0.944 ms stddev 0.547
复制
观察延迟, 很低
postgres=# select * from pg_stat_replication ; -[ RECORD 1 ]----+------------------------------ pid | 17361 usesysid | 10 usename | polardb application_name | replica1 client_addr | 172.25.9.69 client_hostname | client_port | 56684 backend_start | 2021-11-08 15:34:04.711213+08 backend_xmin | state | streaming sent_lsn | 2/456D3BF8 write_lsn | 2/456D3BF8 flush_lsn | 2/456D3BF8 replay_lsn | 2/456D3BF8 write_lag | 00:00:00.000147 flush_lag | 00:00:00.000147 replay_lag | 00:00:00.000244 sync_priority | 1 sync_state | sync -[ RECORD 2 ]----+------------------------------ pid | 17363 usesysid | 10 usename | polardb application_name | replica2 client_addr | 172.25.9.70 client_hostname | client_port | 45858 backend_start | 2021-11-08 15:34:30.442495+08 backend_xmin | state | streaming sent_lsn | 2/456D3BF8 write_lsn | 2/456D3BF8 flush_lsn | 2/456D3BF8 replay_lsn | 2/456D3BF8 write_lag | 00:00:00.000517 flush_lag | 00:00:00.000517 replay_lag | 00:00:00.00052 sync_priority | 2 sync_state | potential
复制
分析等待事件:
集中在wal writer, 因为NBD采用tcp网络, 延迟很高是正常的. 另一方面出现0的TPS说明IO被限流了, 我这个环境用的是ECS+云盘的NBD服务器, ECS网络层、云盘都有限流措施, 可能出现0的情况. 下次可以搞个性能好的本地SSD盘服务器做nbd server.
如果是RT遇到瓶颈, 但是IOBW吞吐没有瓶颈的话, 可以使用group commit提高性能. 如果连IOBW也有瓶颈的话, 就没有办法提升性能了.
postgres=# select wait_event_type,wait_event,count(*) from pg_stat_activity group by 1,2 order by 3 desc; wait_event_type | wait_event | count -----------------+---------------------+------- LWLock | WALWriteLock | 8 Activity | WalSenderMain | 2 Activity | CheckpointerMain | 1 IO | WALInitWrite | 1 Activity | AutoVacuumMain | 1 | | 1 IO | VFSFileOpen | 1 Activity | LogicalLauncherMain | 1 (8 rows)
复制
nbd 服务器, 观察网络、磁盘吞吐:
dstat ----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system-- usr sys idl wai hiq siq| read writ| recv send| in out | int csw 0 2 80 17 0 0| 0 51M| 54M 33M| 0 0 | 61k 99k 0 2 79 18 0 0| 0 50M| 54M 35M| 0 0 | 63k 102k 0 1 89 10 0 0| 0 26M| 28M 36M| 0 0 | 45k 69k 0 1 92 7 0 0| 0 17M| 19M 31M| 0 0 | 36k 55k 0 1 92 7 0 0| 0 18M| 19M 28M| 0 0 | 36k 53k iostat -x 1 avg-cpu: %user %nice %system %iowait %steal %idle 0.13 0.00 0.77 7.17 0.00 91.93 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util vda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 vdb 0.00 1142.00 0.00 6946.00 0.00 18460.00 5.32 0.55 0.08 0.00 0.08 0.14 99.20 vdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 vdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
复制
监控IO性能
su - root pfsadm mountstat nvme0n1
复制
开启组提交
// Sets the delay in microseconds between transaction commit and flushing WAL to disk. postgres=# alter role polardb set commit_delay ='10'; ALTER ROLE // Sets the minimum concurrent open transactions before performing commit_delay. postgres=# alter role polardb set commit_siblings =5; ALTER ROLE
复制
读写压测调到32个并发
pgbench -M prepared -n -r -P 1 -c 32 -j 32 -T 1200
复制
没什么提升效果, 因为存储的IOBW已经打满了.
改成unlogged table后, 使用32个并发的分组提交性能提升比较明显. (注意unlogged table和临时表不支持在RO节点使用, 也就是说主节点创建的unlogged table, 在RO节点是不能查询的.)
progress: 164.0 s, 4019.8 tps, lat 7.965 ms stddev 2.940 progress: 165.0 s, 1676.1 tps, lat 8.152 ms stddev 2.298 progress: 166.0 s, 4027.8 tps, lat 12.503 ms stddev 63.648 progress: 167.0 s, 4445.0 tps, lat 7.197 ms stddev 2.822 progress: 168.0 s, 4357.3 tps, lat 7.342 ms stddev 2.804 progress: 169.0 s, 4567.0 tps, lat 7.006 ms stddev 3.003 progress: 170.0 s, 4648.9 tps, lat 6.881 ms stddev 2.792 progress: 171.0 s, 4427.9 tps, lat 7.226 ms stddev 3.254 progress: 172.0 s, 4468.9 tps, lat 7.163 ms stddev 3.111 progress: 173.0 s, 4571.2 tps, lat 7.003 ms stddev 3.023 progress: 174.0 s, 4695.8 tps, lat 6.814 ms stddev 2.940 progress: 175.0 s, 4627.2 tps, lat 6.914 ms stddev 2.644 progress: 176.0 s, 4466.9 tps, lat 7.159 ms stddev 3.036 progress: 177.0 s, 4508.4 tps, lat 7.109 ms stddev 2.564 progress: 178.0 s, 4474.7 tps, lat 7.143 ms stddev 2.683 progress: 179.0 s, 4476.1 tps, lat 7.156 ms stddev 2.609 progress: 180.0 s, 4622.0 tps, lat 6.924 ms stddev 2.884 progress: 181.0 s, 4726.6 tps, lat 6.770 ms stddev 2.798 progress: 182.0 s, 4480.2 tps, lat 7.142 ms stddev 2.644 progress: 183.0 s, 2224.2 tps, lat 6.899 ms stddev 2.849 progress: 184.0 s, 3633.6 tps, lat 13.391 ms stddev 70.771 progress: 185.0 s, 4314.0 tps, lat 7.416 ms stddev 2.274
复制
敬请期待未来的内容:
- PolarDB PG开源版本备份、恢复、监控、诊断、优化、日常维护等实践.