一个PostgreSQL集群中有8000个数据库?。正如我们已经看到的,这是可行的,但是会带来很多痛苦。满足这种要求的另一种解决方案是在一个主机上托管8000个PostgreSQL集群。再次,我不是说这是您应该做的事情,再次,这不是最终将要实现的方法。这更多地是关于您是否可以执行此操作,以及如何进行这种设置。无论单个集群中的数据库数量是多少,还是单个节点上的集群数是8000,这都是一个巨大的数字。
Linux上的默认PostgreSQL配置带有128MB的shared_buffers。同样,如果我们进行数学计算,则对于单个主机上的8000个群集而言,这将是1’024’000MB。令人惊讶的是我现在没有这样的机器。我们将使用带有4vCPU和16GB内存的AWS EC2 t3a.xlarge。如果将shared_buffers减少到128kB(最小值),我们将得出1024000kB,对于所有群集,这大约是8GB的shared_buffers。这应该是可行的,不计算任何其他与内存相关的参数。
我们需要计算的下一位是所需的磁盘空间。PostgreSQL 13.1的全新初始化PGDATA占用29MB。因此,对于8000个群集,我们将需要312’000MB的磁盘空间。我们开始:
postgres@pg13:/home/postgres/ [pg131] grep MemTotal /proc/meminfo
MemTotal: 16284668 kB
ostgres@pg13:/home/postgres/ [pg131] grep proc /proc/cpuinfo
processor : 0
processor : 1
processor : 2
processor : 3
postgres@pg13:/home/postgres/ [pg131] df -h
Filesystem Size Used Avail Use% Mounted on
udev 7.8G 0 7.8G 0% /dev
tmpfs 1.6G 8.4M 1.6G 1% /run
/dev/nvme0n1p1 492G 2.9G 469G 1% /
tmpfs 7.8G 0 7.8G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 7.8G 0 7.8G 0% /sys/fs/cgroup
/dev/nvme0n1p15 124M 262K 124M 1% /boot/efi
tmpfs 1.6G 0 1.6G 0% /run/user/1000
初始化8000个群集将花费一些时间,但是过程非常简单。对于每个PGDATA,我们需要一个不同的目录,对于每个集群,我们需要一个不同的端口。另外,我们需要将shared_buffers减小到128kB,然后启动所有集群。首先,让我们创建所有数据目录,并检查是否已经遇到任何与此相关的问题:
port="5432"
for i in {1..8000}
do
PGDATA="/u02/pgdata/pgdata${i}"
initdb -D ${PGDATA} > /dev/null 2>&1 > /dev/null
echo "port=${port}" >> ${PGDATA}/postgresql.auto.conf
echo "shared_buffers=128kB" >> ${PGDATA}/postgresql.auto.conf
(( port += 1 ))
done
如果要监视进度,则可以执行以下操作:
postgres@pg13:/home/postgres/ [pg131] watch "ls -l /u02/pgdata | grep -v total | wc -l"
Every 2.0s: ls -l /u02/pgdata | grep -v total | wc -l pg13: Fri Feb 12 10:50:41 2021
146
...
Every 2.0s: ls -l /u02/pgdata | grep -v total | wc -l pg13: Fri Feb 12 11:06:14 2021
1197
...
Every 2.0s: ls -l /u02/pgdata | grep -v total | wc -l pg13: Fri Feb 12 11:13:27 2021
1696
Every 2.0s: ls -l /u02/pgdata | grep -v total | wc -l pg13: Fri Feb 12 11:54:09 2021
4452
Every 2.0s: ls -l /u02/pgdata | grep -v total | wc -l pg13: Fri Feb 12 12:19:44 2021
6116
Every 2.0s: ls -l /u02/pgdata | grep -v total | wc -l pg13: Fri Feb 12 13:35:24 2021
8000
这需要相当长的时间。还是我们消耗多少空间?
postgres@pg13:/home/postgres/ [pg131] du -sh /u02/pgdata/
308G /u02/pgdata/
与我们上面的计算相距不远。在尝试启动所有集群之前,我们需要考虑一下其他方面。如果我们启动PostgreSQL 13.1实例,默认情况下我们有多少个进程?
postgres@debian10pg:/home/postgres/ [pg131] ps -ef | egrep "postgres \-D|postgres\:"
postgres 28408 1 0 07:42 ? 00:00:00 /u01/app/postgres/product/DEV/db_1/bin/postgres -D /u02/pgdata/DEV
postgres 28409 28408 0 07:42 ? 00:00:00 postgres: logger
postgres 28411 28408 0 07:42 ? 00:00:00 postgres: checkpointer
postgres 28412 28408 0 07:42 ? 00:00:00 postgres: background writer
postgres 28413 28408 0 07:42 ? 00:00:00 postgres: walwriter
postgres 28414 28408 0 07:42 ? 00:00:00 postgres: autovacuum launcher
postgres 28415 28408 0 07:42 ? 00:00:00 postgres: stats collector
postgres 28416 28408 0 07:42 ? 00:00:00 postgres: logical replication launcher
ostgres@debian10pg:/home/postgres/ [pg131] ps -ef | egrep "postgres \-D|postgres\:" | wc -l
8
每个集群八个,然后乘以8000,就形成了64’000个进程。在Debian上的默认配置中,我们有以下限制:
postgres@debian10pg:/home/postgres/ [pg131] ulimit -a
postgres@pg13:/home/postgres/ [pg131] ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 63541
max locked memory (kbytes, -l) 65536
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 63541
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
当然可以,因为我们可以轻松尝试:
port="5432"
for i in {1..8000}
do
PGDATA="/u02/pgdata/pgdata${i}"
pg_ctl start -D ${PGDATA}/ -l /dev/null 2>&1 > /dev/null
(( port += 1 ))
done
直到大约1500位邮局主管开始工作之前,这将很好:
unable to fork process: Resource temporarily unavailable
……所以我们显然在这里遇到了一些限制。让我们增加postgres用户的最大打开文件数和处理限制(-1表示无限制,这里只有硬限制):
postgres@pg13:/home/postgres/ [pg131] cat /etc/security/limits.conf | egrep -v "^#|^$"
postgres hard nproc -1
postgres hard nofile -1
重新启动会话后,我们可以检查硬限制的实际值:
postgres@pg13:/home/postgres/ [pg131] ulimit -H -n
1048576
postgres@pg13:/home/postgres/ [pg131] ulimit -H -u
unlimited
这意味着我们应该能够创建无限数量的用户进程,并且我们应该能够打开1048576文件(这似乎是Debian上的最大值)。由于这些是硬性限制,因此我们也需要增加软性限制:
postgres@pg13:/home/postgres/ [pg131] ulimit -S -u unlimited
postgres@pg13:/home/postgres/ [pg131] ulimit -S -n 1048576
postgres@pg13:/home/postgres/ [pg131] ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 63540
max locked memory (kbytes, -l) 65536
max memory size (kbytes, -m) unlimited
open files (-n) 1048576
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) unlimited
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
让我们检查一下是否足以启动8000个集群:
postgres@pg13:/home/postgres/ [pg131] port="5432"
postgres@pg13:/home/postgres/ [pg131] for i in {1..8000}
> do
> PGDATA="/u02/pgdata/pgdata${i}"
> pg_ctl start -D ${PGDATA}/ -l /dev/null 2>&1 > /dev/null
> (( port += 1 ))
> done
不,我们仍然会达到极限(大约1500个集群):
no data was returned by command ""/u01/app/postgres/product/13/db_1/bin/postgres" -V"
The program "postgres" is needed by pg_ctl but was not found in the
same directory as "/u01/app/postgres/product/13/db_1/bin/pg_ctl".
Check your installation.
sh: 1: Cannot fork
让我们尝试使用更大的实例类型:c5.9xlarge:此实例带有36vCPU和72GiB的内存。那行得通吗?
postgres@pg13:/home/postgres/ [pg131] cat /proc/cpuinfo | grep proc | wc -l
36
postgres@pg13:/home/postgres/ [pg131] cat /proc/meminfo | grep Mem
MemTotal: 72033208 kB
MemFree: 71577940 kB
MemAvailable: 71139240 kB
postgres@pg13:/home/postgres/ [pg131] ulimit -S -u unlimited
postgres@pg13:/home/postgres/ [pg131] ulimit -S -n 1048576
postgres@pg13:/home/postgres/ [pg131] for i in {1..8000}
> do
> PGDATA="/u02/pgdata/pgdata${i}"
> pg_ctl start -D ${PGDATA}/ -l /dev/null 2>&1 > /dev/null
> (( port += 1 ))
> done
我们可以走得更远一些,达到大约。1740个正在运行的群集,但是距离8000还很远:
postgres@pg13:/home/postgres/ [pg131] ps -ef | grep postgres | grep "postgres \-D" | wc -l
1736
如果我们增加更多并转到c5a.24xlarge,我们将得出(96vCPU和192GiB内存):
postgres@pg13:/home/postgres/ [pg131] cat /proc/cpuinfo | grep proc | wc -l
96
postgres@pg13:/home/postgres/ [pg131] cat /proc/meminfo | grep Mem
MemTotal: 195998200 kB
MemFree: 195223608 kB
MemAvailable: 194067016 kB
postgres@pg13:/home/postgres/ [pg131] ulimit -S -u unlimited
postgres@pg13:/home/postgres/ [pg131] ulimit -S -n 1048576
postgres@pg13:/home/postgres/ [pg131] for i in {1..8000}; do PGDATA="/u02/pgdata/pgdata${i}"; pg_ctl start -D ${PGDATA}/ -l /dev/null 2>&1 > /dev/null; (( port += 1 )); done
这将为我们提供大约4096个群集,这是我们想要的8000个群集的一半多一点。但这也意味着我们要达到另一个极限。4000个集群将消耗32000个进程,而我们还没有达到上面设置的极限。之后怎么样了?显然,vCPU数量的增加为我们提供了更多正在运行的群集,因此vCPU的数量与我们可以启动的进程数量之间必须存在某种关系。有一个内核参数限制了一个Linux系统上可以启动的最大进程数:
postgres@pg13:/home/postgres/ [pg131] cat /proc/sys/kernel/pid_max
98304
postgres@pg13:/home/postgres/ [pg131] man proc 5
/proc/sys/kernel/pid_max (since Linux 2.5.34)
This file specifies the value at which PIDs wrap around (i.e., the value in this file is one greater than the maximum PID). PIDs greater than this value are not allocated; thus, the value in this file also acts as a system-wide limit on the total
number of processes and threads. The default value for this file, 32768, results in the same range of PIDs as on earlier kernels. On 32-bit platforms, 32768 is the maximum value for pid_max. On 64-bit systems, pid_max can be set to any value up
to 2^22 (PID_MAX_LIMIT, approximately 4 million).
我们已经高于默认值32768,该值取决于CPU的数量(更多信息,请参见)。这实际上不是一个真正的限制,但是一旦达到该数目,计数器将再次从1开始,如果没有剩余数目,则无法启动其他过程。如果将实例缩小到c5.large(4vCPU和8GiB内存),我们将再次获得默认值:
postgres@pg13:/home/postgres/ [pg131] cat /proc/sys/kernel/pid_max
32768
尽管我们不应该达到98304的极限,但让我们尝试增加该极限并再次运行。这次我们还将看一下内存使用情况:
postgres@pg13:/home/postgres/ [pg131] sudo bash
root@pg13:/home/postgres$ echo "196608" > /proc/sys/kernel/pid_max
root@pg13:/home/postgres$ exit
postgres@pg13:/home/postgres/ [pg131] cat /proc/sys/kernel/pid_max
196608
postgres@pg13:/home/postgres/ [pg131] ulimit -S -u unlimited
postgres@pg13:/home/postgres/ [pg131] ulimit -S -n 1048576
postgres@pg13:/home/postgres/ [pg131] for i in {1..8000}
> do
> PGDATA="/u02/pgdata/pgdata${i}"
> pg_ctl start -D ${PGDATA}/ -l /dev/null 2>&1 > /dev/null
> (( port += 1 ))
> done
在第二个会话中监视内存使用情况:
postgres@pg13:/home/postgres/ [pg131] watch "free -m"
## with approx.1000 clusters
Every 2.0s: free -m pg13: Tue Feb 16 13:16:50 2021
total used free shared buff/cache available
Mem: 189152 6412 177810 4183 4929 177353
Swap: 0 0 0
## with approx. 2000 clusters
Every 2.0s: free -m pg13: Tue Feb 16 13:18:38 2021
total used free shared buff/cache available
Mem: 189152 12098 167470 8276 9583 167516
Swap: 0 0 0
## with approx. 3000 clusters
Every 2.0s: free -m pg13: Tue Feb 16 13:20:26 2021
total used free shared buff/cache available
Mem: 189152 17803 157119 12369 14229 157664
Swap: 0 0 0
## with approx. 4000 clusters
Every 2.0s: free -m pg13: Tue Feb 16 13:22:16 2021
total used free shared buff/cache available
Mem: 189152 23673 146495 16540 18983 147562
Swap: 0 0 0
…仍然失败,只有4000多个集群。问题不是整体内存消耗,我们仍然有足够的内存。答案在PostgreSQL启动日志中:
2021-02-16 13:43:17.990 UTC [97614] FATAL: could not create shared memory segment: No space left on device
2021-02-16 13:43:17.990 UTC [97614] DETAIL: Failed system call was shmget(key=21380032, size=56, 03600).
2021-02-16 13:43:17.990 UTC [97614] HINT: This error does *not* mean that you have run out of disk space. It occurs either if all available shared memory IDs have been taken, in which case you need to raise the SHMMNI parameter in your kernel, or because the system's overall limit for shared memory has been reached.
The PostgreSQL documentation contains more information about shared memory configuration.
2021-02-16 13:43:17.990 UTC [97614] LOG: database system is shut down
因此,我们需要增加shmmni以便有足够的可用共享内存段。下次尝试:
postgres@pg13:/home/postgres/ [pg131] sudo sysctl -w kernel.shmmni=8192
kernel.shmmni = 8192
postgres@pg13:/home/postgres/ [pg131] sysctl -p
postgres@pg13:/home/postgres/ [pg131] sudo sysctl -a | grep kernel.shmmni
kernel.shmmni = 8192
postgres@pg13:/home/postgres/ [pg131] for i in {1..8000}
> do
> PGDATA="/u02/pgdata/pgdata${i}"
> pg_ctl start -D ${PGDATA}/ -l /dev/null 2>&1 > /dev/null
> (( port += 1 ))
> done
sh: 1: Cannot fork
no data was returned by command ""/u01/app/postgres/product/13/db_1/bin/postgres" -V"
The program "postgres" is needed by pg_ctl but was not found in the
same directory as "/u01/app/postgres/product/13/db_1/bin/pg_ctl".
Check your installation.
下次达到极限,现在我们达到了大约4600个群集,虽然更好,但仍然不是8000。这一次我们要达到打开文件的数量:
postgres@pg13:/home/postgres/ [pg131] sudo lsof -u postgres 2>/dev/null | wc -l
1884863
postgres@pg13:/home/postgres/ [pg131] ulimit -Hn
1048576
但是,即使我们增加了系统范围的限制,也会遇到相同的问题,因为我们不能将用户定义的限制提高超过1048576:
postgres@pg13:/home/postgres/ [pg131] sudo sysctl -w fs.file-max=100000000
fs.file-max = 100000000
postgres@pg13:/home/postgres/ [pg131] sudo sysctl -a | grep fs.file-max
fs.file-max = 100000000
原因是在sshd_config中:
postgres@pg13:/home/postgres/ [pg131] sudo cat /etc/ssh/sshd_config | grep -i UsePam
UsePAM yes
PAM负责设置用户限制。让我们通过删除这一行并重新启动sshd来跳过它:
postgres@pg13:/home/postgres/ [pg131] sudo sed -i 's/UsePAM yes/UsePAM no/g' /etc/ssh/sshd_config
postgres@pg13:/home/postgres/ [pg131] grep UsePAM /etc/ssh/sshd_config
UsePAM no
postgres@pg13:/home/postgres/ [pg131] sudo systemctl restart sshd
重新启动ssh会话后,让我们再试一次:
dwe@ltdwe:~$ ssh -i /home/dwe/Documents/aws/dwe-key-pair.pem admin@18.195.216.2
admin@18.195.216.2: Permission denied (publickey).
下期,我们将无法登录。这样做的方法是禁用执行限制的行:
postgres@debian10pg:/home/postgres/ [pg14] grep -A 1 "limits" /etc/pam.d/sshd
# access limits that are hard to express in sshd_config.
# account required pam_access.so
--
# Set up user limits from /etc/security/limits.conf.
# session required pam_limits.so
重新启动sshd之后,让我们再试一次:
port="5432"
for i in {1..8000}
do
PGDATA="/u02/pgdata/pgdata${i}"
pg_ctl start -D ${PGDATA}/ -l /home/postgres/${i}log.log
(( port += 1 ))
done
…仍然失败:
2021-02-19 15:49:38.457 UTC [109716] LOG: listening on Unix socket "/tmp/.s.PGSQL.10063"
2021-02-19 15:49:38.461 UTC [109717] LOG: database system was shut down at 2021-02-19 15:39:59 UTC
2021-02-19 15:49:38.465 UTC [109716] LOG: could not fork worker process: Resource temporarily unavailable
2021-02-19 15:49:38.465 UTC [109716] LOG: database system is ready to accept connections
2021-02-19 15:49:50.889 UTC [109716] LOG: received smart shutdown request
下一个测试是检查是否达到用户/会话数或系统范围的限制。为此,我们将使用已有的postgres用户启动前4000个集群。对于其余的4000个群集,我们将创建另一个用户,将$ PGDATA更改为该用户拥有,然后尝试启动它们。如果可行,我们至少可以使用两个单独的OS用户启动8000集群:
root@ip-10-0-1-21:/home/admin# useradd -g postgres -m postgres2
root@ip-10-0-1-21:/home/admin# for i in {4001..8000}
> do
> PGDATA="/u02/pgdata/pgdata${i}"
> chown -R postgres2:postgres ${PGDATA}
> done
Every 2.0s: ps -ef | egrep 'postgres \-D' | wc -l ip-10-0-1-21: Tue Feb 23 10:38:49 2021
4000
前4000个集群:
postgres@ip-10-0-1-21:/home/postgres/ [pg132] for i in {1..4000}; do PGDATA="/u02/pgdata/pgdata${i}"; pg_ctl start -D ${PGDATA}/ -l /home/postgres/${i}log.log; done
Every 2.0s: ps -ef | egrep 'postgres \-D' | wc -l ip-10-0-1-21: Tue Feb 23 10:38:49 2021
4000
让我们检查一下是否可以使用另一个用户启动另一个4000:
postgres2@ip-10-0-1-21:/home/postgres2/ [pg132] id -a
uid=1002(postgres2) gid=1001(postgres) groups=1001(postgres)
postgres@ip-10-0-1-21:/home/postgres2/ [pg132] for i in {1..4000}; do PGDATA="/u02/pgdata/pgdata${i}"; pg_ctl start -D ${PGDATA}/ -l /home/postgres2/${i}log.log; done
postgres2@ip-10-0-1-21:/home/postgres2/ [pg132] cat 4632log.log
2021-02-23 10:45:38.573 UTC [61580] LOG: starting PostgreSQL 13.2 on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
2021-02-23 10:45:38.573 UTC [61580] LOG: listening on IPv4 address "127.0.0.1", port 10063
2021-02-23 10:45:38.578 UTC [61580] LOG: listening on Unix socket "/tmp/.s.PGSQL.10063"
2021-02-23 10:45:38.585 UTC [61581] LOG: database system was shut down at 2021-02-21 17:53:04 UTC
2021-02-23 10:45:38.597 UTC [61580] LOG: could not fork worker process: Resource temporarily unavailable
2021-02-23 10:45:38.597 UTC [61580] LOG: database system is ready to accept connections
一旦我们达到大约4600个群集,就会导致相同的问题。在这一点上,我决定停止这个实验,因为结论很明确:出于明显的原因,不要这样做:
- 您会遇到各种各样的限制
- 在一台主机上管理8000个群集是行不通的,您肯定会一次犯一个错误,然后一个或多个群集可能会损坏
- 只需查看打开文件的数量和进程的数量,就应该再次告诉您:不要这样做
- 许多很多其他原因(备份/还原/修补/升级)
作 者:Daniel Westermann
文章来源:https://blog.dbi-services.com/8000-postgresql-clusters-on-one-host/




