[[toc]]
适用范围
当文件系统异常导致double write文件丢失或损坏,在没有备份或者其它更合适的处理办法时,可尝试该文档中的方法应急启动数据库。
问题概述
当开启double write特性(默认开启),丢失double write文件后数据库无法启动。
数据库启动失败,日志如下,提示DW file文件不存在,错误堆栈时在做DW初始化。
2022-06-20 16:48:06.687 [unknown] [unknown] localhost 140123442810048 0 0 [BACKEND] LOG: start create thread!
2022-06-20 16:48:06.687 [unknown] [unknown] localhost 140123442810048 0 0 [BACKEND] LOG: create thread end!
2022-06-20 16:48:06.693 [unknown] [unknown] localhost 140122435614464 0 0 [BACKEND] LOG: [Alarm Module]alarm checker started.
2022-06-20 16:48:06.694 [unknown] [unknown] localhost 140122416674560 0 0 [BACKEND] LOG: reaper backend started.
2022-06-20 16:48:06.717 [unknown] [unknown] localhost 140122357491456 0 0 [REDO] LOG: [mpfl_ulink_file]: unlink global/max_page_flush_lsn sucessfully! ret:4294967295
2022-06-20 16:48:06.717 [unknown] [unknown] localhost 140122357491456 0 0 [BACKEND] LOG: StartupXLOG: biggest_lsn_in_page is set to FFFFFFFF/FFFFFFFF, enable_update_max_page_flush_lsn:0
2022-06-20 16:48:06.717 [unknown] [unknown] localhost 140122357491456 0 0 [BACKEND] LOG: database system timeline: 17
2022-06-20 16:48:06.717 [unknown] [unknown] localhost 140122357491456 0 0 [BACKEND] LOG: database system was shut down at 2022-06-20 16:47:32 CST
2022-06-20 16:48:06.720 [unknown] [unknown] localhost 140122357491456 0 0 [DBL_WRT] - [ ] PANIC: batch flush DW file does not exist <<<<<
2022-06-20 16:48:06.720 [unknown] [unknown] localhost 140122357491456 0 0 [DBL_WRT] BACKTRACELOG: tid[2843]'s backtrace:
/opt/og/bin/gaussdb(+0x9f16e2) [0x5625215106e2]
/opt/og/bin/gaussdb(_Z9errfinishiz+0x31c) [0x56252150274c]
/opt/og/bin/gaussdb(_Z25dw_file_check_and_rebuildv+0x128) [0x562521da4c68]
/opt/og/bin/gaussdb(_Z7dw_initb+0x7d) [0x562521da870d]
/opt/og/bin/gaussdb(_Z11StartupXLOGv+0x177b) [0x562521dddfbb]
/opt/og/bin/gaussdb(_Z18StartupProcessMainv+0x1ac) [0x5625219add4c]
/opt/og/bin/gaussdb(_Z26GaussDbAuxiliaryThreadMainIL15knl_thread_role26EEiP14knl_thread_arg+0xe0) [0x5625219a8f20]
/opt/og/bin/gaussdb(_Z17GaussDbThreadMainIL15knl_thread_role26EEiP14knl_thread_arg+0x245) [0x5625219a9185]
/opt/og/bin/gaussdb(+0xe6dc25) [0x56252198cc25]
/lib64/libpthread.so.0(+0x7e65) [0x7f70ffbebe65]
/lib64/libc.so.6(clone+0x6d) [0x7f70ff91488d]
Use addr2line to get pretty function name and line
问题原因
启动时需要初始化double write,必要时使用DW RECOVER。由于DW文件丢失启动失败。
解决方案
1、重新初始化一个临时cluster,复制新生成的dw文件到需要修复的cluster中
$ gs_initdb -D ./tmpdata --nodename test
$ ls tmpdata/global/pg_dw*
tmpdata/global/pg_dw tmpdata/global/pg_dw_single
$ cp ./tmpdata/global/pg_dw* data/global/
2、修改参数文件,设置enable_double_write = off
这时启动数据库可以正常启动。
注意:
当需要dw处理fracture page时可能会丢数据。
参考文档
https://docs.mogdb.io/zh/mogdb/v2.1/2-checkpoints#enable_double_write