暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

一台主机上有8000个PostgreSQL集群?

原创 Oracle 2023-04-01
178



一个PostgreSQL集群中有8000个数据库?。正如我们已经看到的,这是可行的,但是会带来很多痛苦。满足这种要求的另一种解决方案是在一个主机上托管8000个PostgreSQL集群。再次,我不是说这是您应该做的事情,再次,这不是最终将要实现的方法。这更多地是关于您是否可以执行此操作,以及如何进行这种设置。无论单个集群中的数据库数量是多少,还是单个节点上的集群数是8000,这都是一个巨大的数字。

Linux上的默认PostgreSQL配置带有128MB的shared_buffers。同样,如果我们进行数学计算,则对于单个主机上的8000个群集而言,这将是1’024’000MB。令人惊讶的是我现在没有这样的机器。我们将使用带有4vCPU和16GB内存的AWS EC2 t3a.xlarge。如果将shared_buffers减少到128kB(最小值),我们将得出1024000kB,对于所有群集,这大约是8GB的shared_buffers。这应该是可行的,不计算任何其他与内存相关的参数。

我们需要计算的下一位是所需的磁盘空间。PostgreSQL 13.1的全新初始化PGDATA占用29MB。因此,对于8000个群集,我们将需要312’000MB的磁盘空间。我们开始:

postgres@pg13:/home/postgres/ [pg131] grep MemTotal /proc/meminfo
MemTotal:       16284668 kB
ostgres@pg13:/home/postgres/ [pg131] grep proc /proc/cpuinfo
processor       : 0
processor       : 1
processor       : 2
processor       : 3
postgres@pg13:/home/postgres/ [pg131] df -h
Filesystem       Size  Used Avail Use% Mounted on
udev             7.8G     0  7.8G   0% /dev
tmpfs            1.6G  8.4M  1.6G   1% /run
/dev/nvme0n1p1   492G  2.9G  469G   1% /
tmpfs            7.8G     0  7.8G   0% /dev/shm
tmpfs            5.0M     0  5.0M   0% /run/lock
tmpfs            7.8G     0  7.8G   0% /sys/fs/cgroup
/dev/nvme0n1p15  124M  262K  124M   1% /boot/efi
tmpfs            1.6G     0  1.6G   0% /run/user/1000

初始化8000个群集将花费一些时间,但是过程非常简单。对于每个PGDATA,我们需要一个不同的目录,对于每个集群,我们需要一个不同的端口。另外,我们需要将shared_buffers减小到128kB,然后启动所有集群。首先,让我们创建所有数据目录,并检查是否已经遇到任何与此相关的问题:

port="5432"
for i in {1..8000}
do
    PGDATA="/u02/pgdata/pgdata${i}"
    initdb -D ${PGDATA} > /dev/null 2>&1 > /dev/null
    echo "port=${port}" >> ${PGDATA}/postgresql.auto.conf
    echo "shared_buffers=128kB" >> ${PGDATA}/postgresql.auto.conf
    (( port += 1 ))
done

如果要监视进度,则可以执行以下操作:

postgres@pg13:/home/postgres/ [pg131] watch "ls -l /u02/pgdata | grep -v total | wc -l"
Every 2.0s: ls -l /u02/pgdata | grep -v total | wc -l                                                 pg13: Fri Feb 12 10:50:41 2021
 
146
...
Every 2.0s: ls -l /u02/pgdata | grep -v total | wc -l                                                 pg13: Fri Feb 12 11:06:14 2021
 
1197
...
Every 2.0s: ls -l /u02/pgdata | grep -v total | wc -l                                                 pg13: Fri Feb 12 11:13:27 2021
 
1696
Every 2.0s: ls -l /u02/pgdata | grep -v total | wc -l                                                 pg13: Fri Feb 12 11:54:09 2021
 
4452
Every 2.0s: ls -l /u02/pgdata | grep -v total | wc -l                                                 pg13: Fri Feb 12 12:19:44 2021
 
6116
Every 2.0s: ls -l /u02/pgdata | grep -v total | wc -l                                                 pg13: Fri Feb 12 13:35:24 2021
 
8000

这需要相当长的时间。还是我们消耗多少空间?

postgres@pg13:/home/postgres/ [pg131] du -sh /u02/pgdata/
308G    /u02/pgdata/

与我们上面的计算相距不远。在尝试启动所有集群之前,我们需要考虑一下其他方面。如果我们启动PostgreSQL 13.1实例,默认情况下我们有多少个进程?

postgres@debian10pg:/home/postgres/ [pg131] ps -ef | egrep "postgres \-D|postgres\:"
postgres 28408     1  0 07:42 ?        00:00:00 /u01/app/postgres/product/DEV/db_1/bin/postgres -D /u02/pgdata/DEV
postgres 28409 28408  0 07:42 ?        00:00:00 postgres: logger 
postgres 28411 28408  0 07:42 ?        00:00:00 postgres: checkpointer 
postgres 28412 28408  0 07:42 ?        00:00:00 postgres: background writer 
postgres 28413 28408  0 07:42 ?        00:00:00 postgres: walwriter 
postgres 28414 28408  0 07:42 ?        00:00:00 postgres: autovacuum launcher 
postgres 28415 28408  0 07:42 ?        00:00:00 postgres: stats collector 
postgres 28416 28408  0 07:42 ?        00:00:00 postgres: logical replication launcher 
ostgres@debian10pg:/home/postgres/ [pg131] ps -ef | egrep "postgres \-D|postgres\:" | wc -l
8

每个集群八个,然后乘以8000,就形成了64’000个进程。在Debian上的默认配置中,我们有以下限制:

postgres@debian10pg:/home/postgres/ [pg131] ulimit -a
postgres@pg13:/home/postgres/ [pg131] ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 63541
max locked memory       (kbytes, -l) 65536
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 63541
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

当然可以,因为我们可以轻松尝试:

port="5432"
for i in {1..8000}
do
    PGDATA="/u02/pgdata/pgdata${i}"
    pg_ctl start -D ${PGDATA}/  -l /dev/null 2>&1 > /dev/null
    (( port += 1 ))
done

直到大约1500位邮局主管开始工作之前,这将很好:

unable to fork process: Resource temporarily unavailable

……所以我们显然在这里遇到了一些限制。让我们增加postgres用户的最大打开文件数和处理限制(-1表示无限制,这里只有硬限制):

postgres@pg13:/home/postgres/ [pg131] cat /etc/security/limits.conf  | egrep -v "^#|^$"
postgres         hard     nproc          -1
postgres         hard     nofile         -1

重新启动会话后,我们可以检查硬限制的实际值:

postgres@pg13:/home/postgres/ [pg131] ulimit -H -n
1048576
postgres@pg13:/home/postgres/ [pg131] ulimit -H -u
unlimited

这意味着我们应该能够创建无限数量的用户进程,并且我们应该能够打开1048576文件(这似乎是Debian上的最大值)。由于这些是硬性限制,因此我们也需要增加软性限制:

postgres@pg13:/home/postgres/ [pg131] ulimit -S -u unlimited
postgres@pg13:/home/postgres/ [pg131] ulimit -S -n 1048576
postgres@pg13:/home/postgres/ [pg131] ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 63540
max locked memory       (kbytes, -l) 65536
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1048576
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) unlimited
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

让我们检查一下是否足以启动8000个集群:

postgres@pg13:/home/postgres/ [pg131] port="5432"
postgres@pg13:/home/postgres/ [pg131] for i in {1..8000}
> do
>     PGDATA="/u02/pgdata/pgdata${i}"
>     pg_ctl start -D ${PGDATA}/  -l /dev/null 2>&1 > /dev/null
>     (( port += 1 ))
> done

不,我们仍然会达到极限(大约1500个集群):

no data was returned by command ""/u01/app/postgres/product/13/db_1/bin/postgres" -V"
The program "postgres" is needed by pg_ctl but was not found in the
same directory as "/u01/app/postgres/product/13/db_1/bin/pg_ctl".
Check your installation.
sh: 1: Cannot fork

让我们尝试使用更大的实例类型:c5.9xlarge:此实例带有36vCPU和72GiB的内存。那行得通吗?

postgres@pg13:/home/postgres/ [pg131] cat /proc/cpuinfo | grep proc | wc -l
36
postgres@pg13:/home/postgres/ [pg131] cat /proc/meminfo | grep Mem
MemTotal:       72033208 kB
MemFree:        71577940 kB
MemAvailable:   71139240 kB
postgres@pg13:/home/postgres/ [pg131] ulimit -S -u unlimited
postgres@pg13:/home/postgres/ [pg131] ulimit -S -n 1048576
postgres@pg13:/home/postgres/ [pg131] for i in {1..8000}
> do
>     PGDATA="/u02/pgdata/pgdata${i}"
>     pg_ctl start -D ${PGDATA}/  -l /dev/null 2>&1 > /dev/null
>     (( port += 1 ))
> done

我们可以走得更远一些,达到大约。1740个正在运行的群集,但是距离8000还很远:

postgres@pg13:/home/postgres/ [pg131] ps -ef | grep postgres | grep "postgres \-D" | wc -l
1736

如果我们增加更多并转到c5a.24xlarge,我们将得出(96vCPU和192GiB内存):

postgres@pg13:/home/postgres/ [pg131] cat /proc/cpuinfo | grep proc | wc -l
96
postgres@pg13:/home/postgres/ [pg131] cat /proc/meminfo | grep Mem
MemTotal:       195998200 kB
MemFree:        195223608 kB
MemAvailable:   194067016 kB
postgres@pg13:/home/postgres/ [pg131] ulimit -S -u unlimited
postgres@pg13:/home/postgres/ [pg131] ulimit -S -n 1048576
postgres@pg13:/home/postgres/ [pg131] for i in {1..8000}; do     PGDATA="/u02/pgdata/pgdata${i}";     pg_ctl start -D ${PGDATA}/  -l /dev/null 2>&1 > /dev/null;     (( port += 1 )); done

这将为我们提供大约4096个群集,这是我们想要的8000个群集的一半多一点。但这也意味着我们要达到另一个极限。4000个集群将消耗32000个进程,而我们还没有达到上面设置的极限。之后怎么样了?显然,vCPU数量的增加为我们提供了更多正在运行的群集,因此vCPU的数量与我们可以启动的进程数量之间必须存在某种关系。有一个内核参数限制了一个Linux系统上可以启动的最大进程数:

postgres@pg13:/home/postgres/ [pg131] cat /proc/sys/kernel/pid_max
98304
postgres@pg13:/home/postgres/ [pg131] man proc 5
       /proc/sys/kernel/pid_max (since Linux 2.5.34)
              This file specifies the value at which PIDs wrap around (i.e., the value in this file is one greater than the maximum PID).  PIDs greater than this value are not allocated; thus, the value in this file also acts as a system-wide limit on  the  total
              number  of  processes and threads.  The default value for this file, 32768, results in the same range of PIDs as on earlier kernels.  On 32-bit platforms, 32768 is the maximum value for pid_max.  On 64-bit systems, pid_max can be set to any value up
              to 2^22 (PID_MAX_LIMIT, approximately 4 million).

我们已经高于默认值32768,该值取决于CPU的数量(更多信息,请参见)。这实际上不是一个真正的限制,但是一旦达到该数目,计数器将再次从1开始,如果没有剩余数目,则无法启动其他过程。如果将实例缩小到c5.large(4vCPU和8GiB内存),我们将再次获得默认值:

postgres@pg13:/home/postgres/ [pg131] cat /proc/sys/kernel/pid_max
32768

尽管我们不应该达到98304的极限,但让我们尝试增加该极限并再次运行。这次我们还将看一下内存使用情况:

postgres@pg13:/home/postgres/ [pg131] sudo bash
root@pg13:/home/postgres$ echo "196608" > /proc/sys/kernel/pid_max
root@pg13:/home/postgres$ exit
postgres@pg13:/home/postgres/ [pg131] cat /proc/sys/kernel/pid_max
196608
postgres@pg13:/home/postgres/ [pg131] ulimit -S -u unlimited
postgres@pg13:/home/postgres/ [pg131] ulimit -S -n 1048576
postgres@pg13:/home/postgres/ [pg131] for i in {1..8000}
> do
>     PGDATA="/u02/pgdata/pgdata${i}"
>     pg_ctl start -D ${PGDATA}/  -l /dev/null 2>&1 > /dev/null
>     (( port += 1 ))
> done

在第二个会话中监视内存使用情况:

postgres@pg13:/home/postgres/ [pg131] watch "free -m"
## with approx.1000 clusters
Every 2.0s: free -m                                                                                   pg13: Tue Feb 16 13:16:50 2021
 
              total        used        free      shared  buff/cache   available
Mem:         189152        6412      177810        4183        4929      177353
Swap:             0           0           0
 
## with approx. 2000 clusters
Every 2.0s: free -m                                                                                   pg13: Tue Feb 16 13:18:38 2021
 
              total        used        free      shared  buff/cache   available
Mem:         189152       12098      167470        8276        9583      167516
Swap:             0           0           0
 
## with approx. 3000 clusters
Every 2.0s: free -m                                                                                   pg13: Tue Feb 16 13:20:26 2021
 
              total        used        free      shared  buff/cache   available
Mem:         189152       17803      157119       12369       14229      157664
Swap:             0           0           0
 
## with approx. 4000 clusters
Every 2.0s: free -m                                                                                   pg13: Tue Feb 16 13:22:16 2021
 
              total        used        free      shared  buff/cache   available
Mem:         189152       23673      146495       16540       18983      147562
Swap:             0           0           0

…仍然失败,只有4000多个集群。问题不是整体内存消耗,我们仍然有足够的内存。答案在PostgreSQL启动日志中:

2021-02-16 13:43:17.990 UTC [97614] FATAL:  could not create shared memory segment: No space left on device
2021-02-16 13:43:17.990 UTC [97614] DETAIL:  Failed system call was shmget(key=21380032, size=56, 03600).
2021-02-16 13:43:17.990 UTC [97614] HINT:  This error does *not* mean that you have run out of disk space.  It occurs either if all available shared memory IDs have been taken, in which case you need to raise the SHMMNI parameter in your kernel, or because the system's overall limit for shared memory has been reached.
        The PostgreSQL documentation contains more information about shared memory configuration.
2021-02-16 13:43:17.990 UTC [97614] LOG:  database system is shut down

因此,我们需要增加shmmni以便有足够的可用共享内存段。下次尝试:

postgres@pg13:/home/postgres/ [pg131] sudo sysctl -w kernel.shmmni=8192
kernel.shmmni = 8192
postgres@pg13:/home/postgres/ [pg131] sysctl -p
postgres@pg13:/home/postgres/ [pg131] sudo sysctl -a | grep kernel.shmmni
kernel.shmmni = 8192
postgres@pg13:/home/postgres/ [pg131] for i in {1..8000}
> do
>     PGDATA="/u02/pgdata/pgdata${i}"
>     pg_ctl start -D ${PGDATA}/  -l /dev/null 2>&1 > /dev/null
>     (( port += 1 ))
> done
 
sh: 1: Cannot fork
no data was returned by command ""/u01/app/postgres/product/13/db_1/bin/postgres" -V"
The program "postgres" is needed by pg_ctl but was not found in the
same directory as "/u01/app/postgres/product/13/db_1/bin/pg_ctl".
Check your installation.

下次达到极限,现在我们达到了大约4600个群集,虽然更好,但仍然不是8000。这一次我们要达到打开文件的数量:

postgres@pg13:/home/postgres/ [pg131] sudo lsof -u postgres 2>/dev/null | wc -l
1884863
postgres@pg13:/home/postgres/ [pg131] ulimit -Hn
1048576

但是,即使我们增加了系统范围的限制,也会遇到相同的问题,因为我们不能将用户定义的限制提高超过1048576:

postgres@pg13:/home/postgres/ [pg131] sudo sysctl -w fs.file-max=100000000
fs.file-max = 100000000
postgres@pg13:/home/postgres/ [pg131] sudo sysctl -a | grep fs.file-max
fs.file-max = 100000000

原因是在sshd_config中:

postgres@pg13:/home/postgres/ [pg131]  sudo cat /etc/ssh/sshd_config | grep -i UsePam
UsePAM yes

PAM负责设置用户限制。让我们通过删除这一行并重新启动sshd来跳过它:

postgres@pg13:/home/postgres/ [pg131] sudo sed -i 's/UsePAM yes/UsePAM no/g' /etc/ssh/sshd_config
postgres@pg13:/home/postgres/ [pg131] grep UsePAM /etc/ssh/sshd_config
UsePAM no
postgres@pg13:/home/postgres/ [pg131] sudo systemctl restart sshd

重新启动ssh会话后,让我们再试一次:

dwe@ltdwe:~$ ssh -i /home/dwe/Documents/aws/dwe-key-pair.pem admin@18.195.216.2
admin@18.195.216.2: Permission denied (publickey).

下期,我们将无法登录。这样做的方法是禁用执行限制的行:

postgres@debian10pg:/home/postgres/ [pg14] grep -A 1 "limits" /etc/pam.d/sshd 
# access limits that are hard to express in sshd_config.
# account  required     pam_access.so
--
# Set up user limits from /etc/security/limits.conf.
# session    required     pam_limits.so

重新启动sshd之后,让我们再试一次:

port="5432"
for i in {1..8000}
do
    PGDATA="/u02/pgdata/pgdata${i}"
    pg_ctl start -D ${PGDATA}/  -l /home/postgres/${i}log.log
    (( port += 1 ))
done

…仍然失败:

2021-02-19 15:49:38.457 UTC [109716] LOG:  listening on Unix socket "/tmp/.s.PGSQL.10063"
2021-02-19 15:49:38.461 UTC [109717] LOG:  database system was shut down at 2021-02-19 15:39:59 UTC
2021-02-19 15:49:38.465 UTC [109716] LOG:  could not fork worker process: Resource temporarily unavailable
2021-02-19 15:49:38.465 UTC [109716] LOG:  database system is ready to accept connections
2021-02-19 15:49:50.889 UTC [109716] LOG:  received smart shutdown request

下一个测试是检查是否达到用户/会话数或系统范围的限制。为此,我们将使用已有的postgres用户启动前4000个集群。对于其余的4000个群集,我们将创建另一个用户,将$ PGDATA更改为该用户拥有,然后尝试启动它们。如果可行,我们至少可以使用两个单独的OS用户启动8000集群:

root@ip-10-0-1-21:/home/admin# useradd -g postgres -m postgres2
root@ip-10-0-1-21:/home/admin# for i in {4001..8000}
> do
>     PGDATA="/u02/pgdata/pgdata${i}"
>     chown -R postgres2:postgres ${PGDATA}
> done
Every 2.0s: ps -ef | egrep 'postgres \-D' | wc -l                                                                                                                           ip-10-0-1-21: Tue Feb 23 10:38:49 2021
 
4000

前4000个集群:

postgres@ip-10-0-1-21:/home/postgres/ [pg132] for i in {1..4000}; do     PGDATA="/u02/pgdata/pgdata${i}";     pg_ctl start -D ${PGDATA}/  -l /home/postgres/${i}log.log; done
Every 2.0s: ps -ef | egrep 'postgres \-D' | wc -l                                                                                                                           ip-10-0-1-21: Tue Feb 23 10:38:49 2021
 
4000

让我们检查一下是否可以使用另一个用户启动另一个4000:

postgres2@ip-10-0-1-21:/home/postgres2/ [pg132] id -a
uid=1002(postgres2) gid=1001(postgres) groups=1001(postgres)
postgres@ip-10-0-1-21:/home/postgres2/ [pg132] for i in {1..4000}; do     PGDATA="/u02/pgdata/pgdata${i}";     pg_ctl start -D ${PGDATA}/  -l /home/postgres2/${i}log.log; done
postgres2@ip-10-0-1-21:/home/postgres2/ [pg132] cat 4632log.log
2021-02-23 10:45:38.573 UTC [61580] LOG:  starting PostgreSQL 13.2 on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
2021-02-23 10:45:38.573 UTC [61580] LOG:  listening on IPv4 address "127.0.0.1", port 10063
2021-02-23 10:45:38.578 UTC [61580] LOG:  listening on Unix socket "/tmp/.s.PGSQL.10063"
2021-02-23 10:45:38.585 UTC [61581] LOG:  database system was shut down at 2021-02-21 17:53:04 UTC
2021-02-23 10:45:38.597 UTC [61580] LOG:  could not fork worker process: Resource temporarily unavailable
2021-02-23 10:45:38.597 UTC [61580] LOG:  database system is ready to accept connections

一旦我们达到大约4600个群集,就会导致相同的问题。在这一点上,我决定停止这个实验,因为结论很明确:出于明显的原因,不要这样做:

  • 您会遇到各种各样的限制
  • 在一台主机上管理8000个群集是行不通的,您肯定会一次犯一个错误,然后一个或多个群集可能会损坏
  • 只需查看打开文件的数量和进程的数量,就应该再次告诉您:不要这样做
  • 许多很多其他原因(备份/还原/修补/升级)

作 者:Daniel Westermann
文章来源:https://blog.dbi-services.com/8000-postgresql-clusters-on-one-host/

「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论