PostgreSQL调整OOM参数的方法
0. OOM概述
当数据库服务器的物理内存和交换内存耗尽时,Linux会发生out-of-memory (OOM) killer。如果PostgreSQL数据库遇到OOM,可以采取以下措施解决:
-
如果是PostgreSQL数据库本身导致OOM的,可以采取以下措施:
-
降低内存相关参数的设置,主要是参数
shared_buffers
和work_mem
。 -
检查是否是数据库连接数太多,是否可以通过参数
max_connections
减少数据库的最大连接数;是否可以使用外部连接池工具。 -
On Linux 2.6 and later,修改内核参数尽量避免(该方法不能完全避免OOM Killer),只能降低出现OOM Killer的机会。
sysctl -w vm.overcommit_memory=2 # 或者 echo 'vm.overcommit_memory=2' >> /etc/sysctl.conf sysctl -p
复制 -
设置进程相关的OOM分数调整值(score adjustment value),例如为postmaster进程设置为
-1000
,这样就会保证PostgreSQL进程不会成为系统OOM Killer的目标:The simplest way to do this is to execute the following script in the postmaster’s startup script just before invoking the postmaster. Note that this action must be done as root, or it will have no effect; so a root-owned startup script is the easiest place to do it. If you do this, you should also set these environment variables in the startup script before invoking the postmaster:
echo -1000 > /proc/self/oom_score_adj
复制export PG_OOM_ADJUST_FILE=/proc/self/oom_score_adj export PG_OOM_ADJUST_VALUE=0
复制These settings will cause postmaster child processes to run with the normal OOM score adjustment of zero, so that the OOM killer can still target them at need. You could use some other value for
PG_OOM_ADJUST_VALUE
if you want the child processes to run with some other OOM score adjustment. (PG_OOM_ADJUST_VALUE
can also be omitted, in which case it defaults to zero.) If you do not setPG_OOM_ADJUST_FILE
, the child processes will run with the same OOM score adjustment as the postmaster, which is unwise since the whole point is to ensure that the postmaster has a preferential setting.这些设置将导致 postmaster 子进程使用普通的值为零的
OOM score adjustment
运行,所以OOM 杀手仍能在需要时把它们作为目标。如果你想要子进程用某些其他OOM score adjustment
值运行,可以为PG_OOM_ADJUST_VALUE
使用其他的值(PG_OOM_ADJUST_VALUE
也能被省略,那时它会被默认为零)。如果你没有设置PG_OOM_ADJUST_FILE
,子进程将使用和 postmaster 相同的OOM score adjustment
运行,这是不明智的,因为重点是确保 postmaster具有优先的设置。Older Linux kernels do not offer
/proc/self/oom_score_adj
, but may have a previous version of the same functionality called/proc/self/oom_adj
. This works the same except the disable value is-17
not-1000
.更老的 Linux 内核不提供
/proc/self/oom_score_adj
,但是可能有一个具有相同功能的早期版本,它被称为/proc/self/oom_adj
。这种方式工作起来完全相同,除了禁用值是-17而不是-1000。
-
1. PostgreSQL源码中关于oom环境变量的设置
Linux系统下,默认发生OOM时,系统会kill进程。通过环境变量,可以设置PostgreSQL的OOM参数(oom_adj值):
在源文件src/backend/postmaster/fork_process.c
文件的函数pid_t fork_process(void)
中有读取环境变量PG_OOM_ADJUST_FILE
(指定oom调整文件)和PG_OOM_ADJUST_VALUE
(OOM分数调整值):
/*
* By default, Linux tends to kill the postmaster in out-of-memory
* situations, because it blames the postmaster for the sum of child
* process sizes *including shared memory*. (This is unbelievably
* stupid, but the kernel hackers seem uninterested in improving it.)
* Therefore it's often a good idea to protect the postmaster by
* setting its OOM score adjustment negative (which has to be done in
* a root-owned startup script). Since the adjustment is inherited by
* child processes, this would ordinarily mean that all the
* postmaster's children are equally protected against OOM kill, which
* is not such a good idea. So we provide this code to allow the
* children to change their OOM score adjustments again. Both the
* file name to write to and the value to write are controlled by
* environment variables, which can be set by the same startup script
* that did the original adjustment.
*/
oomfilename = getenv("PG_OOM_ADJUST_FILE");
if (oomfilename != NULL)
{
/*
* Use open() not stdio, to ensure we control the open flags. Some
* Linux security environments reject anything but O_WRONLY.
*/
int fd = open(oomfilename, O_WRONLY, 0);
/* We ignore all errors */
if (fd >= 0)
{
const char *oomvalue = getenv("PG_OOM_ADJUST_VALUE");
int rc;
if (oomvalue == NULL) /* supply a useful default */
oomvalue = "0";
rc = write(fd, oomvalue, strlen(oomvalue));
(void) rc;
close(fd);
}
}
复制
2. PostgreSQL官方文档关于oom环境变量的设置说明
2.1. 17.4.4. Linux Memory Overcommit
In Linux 2.4 and later, the default virtual memory behavior is not optimal for PostgreSQL. Because of the way that the kernel implements memory overcommit, the kernel might terminate the PostgreSQL postmaster (the master server process) if the memory demands of either PostgreSQL or another process cause the system to run out of virtual memory.
If this happens, you will see a kernel message that looks like this (consult your system documentation and configuration on where to look for such a message):
Out of Memory: Killed process 12345 (postgres).
复制
This indicates that the postgres
process has been terminated due to memory pressure. Although existing database connections will continue to function normally, no new connections will be accepted. To recover, PostgreSQL will need to be restarted.
One way to avoid this problem is to run PostgreSQL on a machine where you can be sure that other processes will not run the machine out of memory. If memory is tight, increasing the swap space of the operating system can help avoid the problem, because the out-of-memory (OOM) killer is invoked only when physical memory and swap space are exhausted.
If PostgreSQL itself is the cause of the system running out of memory, you can avoid the problem by changing your configuration. In some cases, it may help to lower memory-related configuration parameters, particularly shared_buffers
and work_mem
. n other cases, the problem may be caused by allowing too many connections to the database server itself. In many cases, it may be better to reduce max_connections
and instead make use of external connection-pooling software.
On Linux 2.6 and later, it is possible to modify the kernel’s behavior so that it will not “overcommit” memory. Although this setting will not prevent the OOM killer from being invoked altogether, it will lower the chances significantly and will therefore lead to more robust system behavior. This is done by selecting strict overcommit mode via sysctl
:
sysctl -w vm.overcommit_memory=2
复制
or placing an equivalent entry in /etc/sysctl.conf
. You might also wish to modify the related setting vm.overcommit_ratio
. For details see the kernel documentation file Documentation/vm/overcommit-accounting
.
Another approach, which can be used with or without altering vm.overcommit_memory
, is to set the process-specific OOM score adjustment value for the postmaster process to -1000
, thereby guaranteeing it will not be targeted by the OOM killer. The simplest way to do this is to execute
echo -1000 > /proc/self/oom_score_adj
复制
in the postmaster’s startup script just before invoking the postmaster. Note that this action must be done as root, or it will have no effect; so a root-owned startup script is the easiest place to do it. If you do this, you should also set these environment variables in the startup script before invoking the postmaster:
export PG_OOM_ADJUST_FILE=/proc/self/oom_score_adj export PG_OOM_ADJUST_VALUE=0
复制
These settings will cause postmaster child processes to run with the normal OOM score adjustment of zero, so that the OOM killer can still target them at need. You could use some other value for PG_OOM_ADJUST_VALUE
if you want the child processes to run with some other OOM score adjustment. (PG_OOM_ADJUST_VALUE
can also be omitted, in which case it defaults to zero.) If you do not set PG_OOM_ADJUST_FILE
, the child processes will run with the same OOM score adjustment as the postmaster, which is unwise since the whole point is to ensure that the postmaster has a preferential setting.
Older Linux kernels do not offer /proc/self/oom_score_adj
, but may have a previous version of the same functionality called /proc/self/oom_adj
. This works the same except the disable value is -17
not -1000
.
Note: Some vendors’ Linux 2.4 kernels are reported to have early versions of the 2.6 overcommit
sysctl
parameter. However, settingvm.overcommit_memory
to 2 on a 2.4 kernel that does not have the relevant code will make things worse, not better. It is recommended that you inspect the actual kernel source code (see the functionvm_enough_memory
in the filemm/mmap.c
) to verify what is supported in your kernel before you try this in a 2.4 installation. The presence of theovercommit-accounting
documentation file should not be taken as evidence that the feature is there. If in any doubt, consult a kernel expert or your kernel vendor.
2.2. 配置systemctl服务
注意:因为需要root用户才可以设置OOMAdjust值,所以需要编写systemctl服务,在服务文件中设置oom环境变量,具体如下:
vi /etc/systemd/system/postgresql.service
复制
[Unit]
Description=PostgreSQL database server
After=network.target
[Service]
Type=forking
User=postgres
Group=postgres
# Where to send early-startup messages from the server (before the logging
# options of postgresql.conf take effect)
# This is normally controlled by the global default set by systemd
# StandardOutput=syslog
# Disable OOM kill on the postmaster
OOMScoreAdjust=-1000
# ... but allow it still to be effective for child processes
# (note that these settings are ignored by Postgres releases before 9.5)
Environment=PG_OOM_ADJUST_FILE=/proc/self/oom_score_adj
Environment=PG_OOM_ADJUST_VALUE=-1000
# Maximum number of seconds pg_ctl will wait for postgres to start. Note that
# PGSTARTTIMEOUT should be less than TimeoutSec value.
Environment=PGSTARTTIMEOUT=270
Environment=PGDATA=/mdb/postgres/pg5432
PrivateTmp=true #是否给服务分配独立的临时空间(true/false),要给的
ExecStart=/usr/local/postgres/bin/pg_ctl start -D ${PGDATA} -s -w -t ${PGSTARTTIMEOUT}
ExecStop=/usr/local/postgres/bin/pg_ctl stop -D ${PGDATA} -s -m fast
ExecReload=/usr/local/postgres/bin/pg_ctl reload -D ${PGDATA} -s
# Give a reasonable amount of time for the server to start up/shut down.
# Ideally, the timeout for starting PostgreSQL server should be handled more
# nicely by pg_ctl in ExecStart, so keep its timeout smaller than this value.
TimeoutSec=300
[Install]
WantedBy=multi-user.target
复制
systemctl daemon-reload
systemctl enable postgresql
systemctl start postgresql
复制
2.4. 检查systemctl启动的数据库进程的oom值
检查PostgreSQL各进程的OOM配置是否生效
2.4.1. 没有配置OOM参数时
显然,PostgreSQL数据库进程的oom_score_adj都是默认值0,如下:
另外,PostgreSQL的主进程**/usr/local/postgres/bin/postgres -D /mdb/postgres/pg5432**的oom_score是3,其他子进程都是0。
[postgres@mdbtest2-dgtestdbs-lv01 ~]$ pg_ctl start -D /mdb/postgres/pg5432/
waiting for server to start....2021-01-14 15:11:12.764 CST [26931] LOG: redirecting log output to logging collector process
2021-01-14 15:11:12.764 CST [26931] HINT: Future log output will appear in directory "log".
done
server started
[postgres@mdbtest2-dgtestdbs-lv01 ~]$ ps aux|grep postgres
root 26348 0.0 0.0 235992 6812 pts/1 S 15:11 0:00 su - postgres
postgres 26349 0.3 0.0 116340 4476 pts/1 S 15:11 0:00 -bash
postgres 26931 1.3 0.5 1302120 95380 ? Ss 15:11 0:00 /usr/local/postgres/bin/postgres -D /mdb/postgres/pg5432
postgres 26932 0.0 0.0 126896 3376 ? Ss 15:11 0:00 postgres: dbdebug: logger
postgres 26934 0.0 0.0 1302120 2744 ? Ss 15:11 0:00 postgres: dbdebug: checkpointer
postgres 26935 0.0 0.0 1302120 2744 ? Ss 15:11 0:00 postgres: dbdebug: background writer
postgres 26936 0.0 0.0 1302120 2744 ? Ss 15:11 0:00 postgres: dbdebug: walwriter
postgres 26937 0.0 0.0 1304092 7212 ? Ss 15:11 0:00 postgres: dbdebug: autovacuum launcher
postgres 26938 0.0 0.0 129012 2580 ? Ss 15:11 0:00 postgres: dbdebug: archiver
postgres 26939 0.0 0.0 129148 3500 ? Ss 15:11 0:00 postgres: dbdebug: stats collector
postgres 26940 0.0 0.0 1303944 5500 ? Ss 15:11 0:00 postgres: dbdebug: logical replication launcher
postgres 27267 0.0 0.0 155372 3852 pts/1 R+ 15:11 0:00 ps aux
postgres 27268 0.0 0.0 112724 2256 pts/1 S+ 15:11 0:00 grep --color=auto postgres
[postgres@mdbtest2-dgtestdbs-lv01 ~]$ cat /proc/{26931,26932,26934,26935,26936,26937,26938,26939}/oom_score
3
0
0
0
0
0
0
0
[postgres@mdbtest2-dgtestdbs-lv01 ~]$ cat /proc/{26931,26932,26934,26935,26936,26937,26938,26939}/oom_score_adj
0
0
0
0
0
0
0
0
[postgres@mdbtest2-dgtestdbs-lv01 ~]$
复制
2.4.2 使用systemctl服务配置oom参数后
显然,PostgreSQL各进程的oom_score_adj时是与配置的环境变量**PG_OOM_ADJUST_VALUE=-1000**
一致的,如下在:
[root@mdbtest2-dgtestdbs-lv01 postgresql-13.0]# systemctl start postgresql
[root@mdbtest2-dgtestdbs-lv01 ~]# ps aux|grep postgres
postgres 1144 0.0 0.5 1302120 95208 ? Ss 15:03 0:00 /usr/local/postgres/bin/postgres -D /mdb/postgres/pg5432
postgres 1164 0.0 0.0 126896 3228 ? Ss 15:03 0:00 postgres: dbdebug: logger
postgres 1168 0.0 0.0 1302636 13248 ? Ss 15:03 0:00 postgres: dbdebug: checkpointer
postgres 1169 0.0 0.0 1302636 13600 ? Ss 15:03 0:00 postgres: dbdebug: background writer
postgres 1170 0.0 0.0 1302120 6188 ? Ss 15:03 0:00 postgres: dbdebug: walwriter
postgres 1171 0.0 0.0 1304092 7040 ? Ss 15:03 0:00 postgres: dbdebug: autovacuum launcher
postgres 1172 0.0 0.0 129012 2560 ? Ss 15:03 0:00 postgres: dbdebug: archiver
postgres 1173 0.0 0.0 129144 3352 ? Ss 15:03 0:00 postgres: dbdebug: stats collector
postgres 1174 0.0 0.0 1303944 5788 ? Ss 15:03 0:00 postgres: dbdebug: logical replication launcher
root 21824 0.0 0.0 112724 2360 pts/1 R+ 15:09 0:00 grep --color postgres
[root@mdbtest2-dgtestdbs-lv01 ~]# cat /proc/{1144,1164,1168,1169,1170,1171,1172,1173,1174}/oom_score
0
0
0
0
0
0
0
0
0
[root@mdbtest2-dgtestdbs-lv01 ~]# cat /proc/{1144,1164,1168,1169,1170,1171,1172,1173,1174}/oom_score_adj
-1000
-1000
-1000
-1000
-1000
-1000
-1000
-1000
-1000
[root@mdbtest2-dgtestdbs-lv01 ~]#
复制
3. 调试环境变量是否生效
[postgres@mdbtest2-dgtestdbs-lv01 ~]$ export PG_OOM_ADJUST_FILE=/proc/self/oom_score_adj [postgres@mdbtest2-dgtestdbs-lv01 ~]$ export PG_OOM_ADJUST_VALUE=-1000 [postgres@mdbtest2-dgtestdbs-lv01 ~]$ gdb postgres -- 不要使用pg_ctl启动数据库,pg_ctl的入库跟postgres的入库不一样。 GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-114.el7 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /usr/local/postgres/bin/postgres...done. (gdb) break fork_process.c:62 -- 设置断点 Breakpoint 1 at 0x81113a: file fork_process.c, line 62. (gdb) set follow-fork-mode child -- 设置要调试的是子进程,而不是父进程. (gdb) run -D /mdb/postgres/pg5432/ -- 启动postgresql数据库 Starting program: /usr/local/postgres/bin/postgres -D /mdb/postgres/pg5432/ [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Breakpoint 1, fork_process () at fork_process.c:62 62 result = fork(); Missing separate debuginfos, use: debuginfo-install glibc-2.17-260.0.9.el7.x86_64 (gdb) (gdb) p oomfilename $2 = 0xf2ea63 "/proc/self/oom_score_adj" (gdb) p fd $3 = 13 (gdb) p oomvalue $5 = 0xf2e8d4 "-1000" (gdb) p rc $4 = -1 (gdb)
复制
显然,程序在执行以下代码后返回值rc为-1,也就是系统调用write报错(PostgreSQL官方文档也说了,需要root用户设置相关环境变量才会生效,显然是没有权限写oom_score_adj文件),所以如果要设置oom环境变量只能使用系统服务启动PostgreSQL数据库:
rc = write(fd, oomvalue, strlen(oomvalue));
复制
下面修改程序src/backend/postmaster/fork_process.c
中的函数,在调用write函数后增加错误码处理:
重新编译:
make make install
复制
验证
[postgres@mdbtest2-dgtestdbs-lv01 ~]$ export PG_OOM_ADJUST_FILE=/proc/self/oom_score_adj -- 设置环境变量
[postgres@mdbtest2-dgtestdbs-lv01 ~]$ export PG_OOM_ADJUST_VALUE=-1000 -- 设置环境变量
[postgres@mdbtest2-dgtestdbs-lv01 ~]$ gdb postgres
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-114.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/local/postgres/bin/postgres...done.
(gdb) set follow-fork-mode child -- 调试子进程
(gdb) break fork_process.c:63 if result==0 -- 设置条件断点
Breakpoint 1 at 0x811142: file fork_process.c, line 63.
(gdb) run -D /mdb/postgres/pg5432/ -- 启动需要调试的程序
Starting program: /usr/local/postgres/bin/postgres -D /mdb/postgres/pg5432/
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New process 1440]
2021-01-14 17:39:20.248 CST [1436] LOG: redirecting log output to logging collector process
2021-01-14 17:39:20.248 CST [1436] HINT: Future log output will appear in directory "log".
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[Switching to Thread 0x7ffff7fdc740 (LWP 1440)]
Breakpoint 1, fork_process () at fork_process.c:63
63 if (result == 0)
Missing separate debuginfos, use: debuginfo-install glibc-2.17-260.0.9.el7.x86_64
(gdb) p result -- 命中断点后,检查fork函数的返回值。
$1 = 0
(gdb) n
86 oomfilename = getenv("PG_OOM_ADJUST_FILE");
(gdb) n
88 if (oomfilename != NULL)
(gdb) n
94 int fd = open(oomfilename, O_WRONLY, 0);
(gdb) n
97 if (fd >= 0)
(gdb) p fd
$2 = 13
(gdb) n
99 const char *oomvalue = getenv("PG_OOM_ADJUST_VALUE");
(gdb) n
102 if (oomvalue == NULL) /* supply a useful default */
(gdb) p oomvalue -- 查看获取的oom_adj是否正确
$3 = 0xf2e8d4 "-1000"
(gdb) p *oomvalue
$4 = 45 '-'
(gdb) n
105 rc = write(fd, oomvalue, strlen(oomvalue));
(gdb) n
106 int err = errno;
(gdb) p rc -- 检查write系统调用的返回值.
$5 = -1
(gdb) n
107 printf("errno is %d\n",err);
(gdb) p err -- 查看write系统调用的返回码。
$6 = 13
(gdb) n
errno is 13
110 close(fd);
(gdb)
复制
下面再次修改程序,使用系统调用perror输出错误信息:
重新编译:
make make install
复制
再试表示,显示是权限不足(**Permission denied **):
[postgres@mdbtest2-dgtestdbs-lv01 postgresql-13.0]$ echo PG_OOM_ADJUST_FILE
PG_OOM_ADJUST_FILE
[postgres@mdbtest2-dgtestdbs-lv01 postgresql-13.0]$ gdb postgres
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-114.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/local/postgres/bin/postgres...done.
(gdb) set follow-fork-mode child
(gdb) break fork_process.c:106
Breakpoint 1 at 0x8111c5: file fork_process.c, line 106.
(gdb) info break
Num Type Disp Enb Address What
1 breakpoint keep y 0x00000000008111c5 in fork_process at fork_process.c:106
(gdb) run -D /mdb/postgres/pg5432/
Starting program: /usr/local/postgres/bin/postgres -D /mdb/postgres/pg5432/
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New process 4077]
2021-01-14 17:49:31.373 CST [4064] LOG: redirecting log output to logging collector process
2021-01-14 17:49:31.373 CST [4064] HINT: Future log output will appear in directory "log".
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[Switching to Thread 0x7ffff7fdc740 (LWP 4077)]
Breakpoint 1, fork_process () at fork_process.c:106
106 perror("errno is ");
Missing separate debuginfos, use: debuginfo-install glibc-2.17-260.0.9.el7.x86_64
(gdb) n
errno is : Permission denied -- 系统调用write返回报错信息.
109 close(fd);
(gdb)
复制