作者
digoal
日期
2021-03-20
标签
PostgreSQL , recovery , open file , crash性能问题
背景
当数据库实例表很多时, 由于每个表包括main, fsm, init, toast等多个file fork, 如果有索引, 每个索引有对应的main, toast文件, 所以数据库实例的表很多时, 数据库里面会有很多文件, 在crash recovery 时, 需要递归open所有file, 导致recovery非常慢.
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=2ce439f3
```
Recursively fsync() the data directory after a crash.
author Robert Haas rhaas@postgresql.org
Mon, 4 May 2015 18:13:53 +0000 (14:13 -0400)
committer Robert Haas rhaas@postgresql.org
Mon, 4 May 2015 18:13:53 +0000 (14:13 -0400)
commit 2ce439f3379aed857517c8ce207485655000fc8e
tree 196ccf42f41939b101aa6d4587ca93c8060b51a2 tree
parent ec3d976bce7e322c29f1007d19b63b7a3a1a6ee4 commit | diff
Recursively fsync() the data directory after a crash.
Otherwise, if there's another crash, some writes from after the first
crash might make it to disk while writes from before the crash fail
to make it to disk. This could lead to data corruption.
Back-patch to all supported versions.
Abhijit Menon-Sen, reviewed by Andres Freund and slightly revised
by me.
```
PostgreSQL 14 增加参数recovery_init_sync_method, 使用syncfs方法加速, 仅支持Linux.
默认 recovery_init_sync_method=fsync
syncfs (Linux 5.8+)
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=61752afb26404dfc99a535c7a53f7f04dc110263
```
Provide recovery_init_sync_method=syncfs. master github/master
author Thomas Munro tmunro@postgresql.org
Fri, 19 Mar 2021 22:46:32 +0000 (11:46 +1300)
committer Thomas Munro tmunro@postgresql.org
Fri, 19 Mar 2021 23:07:28 +0000 (12:07 +1300)
commit 61752afb26404dfc99a535c7a53f7f04dc110263
tree dbb477a1f01f495a180e891028e3d1545532881d tree
parent b822ae13ea93c18326d58d47829bbc66d36fae5c commit | diff
Provide recovery_init_sync_method=syncfs.
Since commit 2ce439f3 we have opened every file in the data directory
and called fsync() at the start of crash recovery. This can be very
slow if there are many files, leading to field complaints of systems
taking minutes or even hours to begin crash recovery.
Provide an alternative method, for Linux only, where we call syncfs() on
every possibly different filesystem under the data directory. This is
equivalent, but avoids faulting in potentially many inodes from
potentially slow storage.
The new mode comes with some caveats, described in the documentation, so
the default value for the new setting is "fsync", preserving the older
behavior.
Reported-by: Michael Brown michael.brown@discourse.org
Reviewed-by: Fujii Masao masao.fujii@oss.nttdata.com
Reviewed-by: Paul Guo guopa@vmware.com
Reviewed-by: Bruce Momjian bruce@momjian.us
Reviewed-by: Justin Pryzby pryzby@telsasoft.com
Reviewed-by: David Steele david@pgmasters.net
Discussion: https://postgr.es/m/11bc2bb7-ecb5-3ad0-b39f-df632734cd81%40discourse.org
Discussion: https://postgr.es/m/CAEET0ZHGnbXmi8yF3ywsDZvb3m9CbdsGZgfTXscQ6agcbzcZAw%40mail.gmail.com
```
+#ifdef HAVE_SYNCFS
+ if (recovery_init_sync_method == RECOVERY_INIT_SYNC_METHOD_SYNCFS)
+ {
+ DIR *dir;
+ struct dirent *de;
+
+ /*
+ * On Linux, we don't have to open every single file one by one. We
+ * can use syncfs() to sync whole filesystems. We only expect
+ * filesystem boundaries to exist where we tolerate symlinks, namely
+ * pg_wal and the tablespaces, so we call syncfs() for each of those
+ * directories.
+ */
+
+ /* Sync the top level pgdata directory. */
+ do_syncfs(".");
+ /* If any tablespaces are configured, sync each of those. */
+ dir = AllocateDir("pg_tblspc");
+ while ((de = ReadDirExtended(dir, "pg_tblspc", LOG)))
+ {
+ char path[MAXPGPATH];
+
+ if (strcmp(de->d_name, ".") == 0 || strcmp(de->d_name, "..") == 0)
+ continue;
+
+ snprintf(path, MAXPGPATH, "pg_tblspc/%s", de->d_name);
+ do_syncfs(path);
+ }
+ FreeDir(dir);
+ /* If pg_wal is a symlink, process that too. */
+ if (xlog_is_symlink)
+ do_syncfs("pg_wal");
+ return;
+ }
+#endif /* !HAVE_SYNCFS */
+
PostgreSQL 许愿链接
您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。开不开森.