鲲鹏916+openeuler22.03平台,安装opengauss 6.0 单机轻量版,正常运行过程中崩溃,时间上没有规律,崩溃时日志内容:
2025-01-03 15:18:00.088 [unknown] [unknown] localhost 70374483930848 0[0:0#0] 0 [UNDO] LOG: [UndoRecycleMain:988]update globalRecycleXid: oldestXmin=3610745, recycleXmin=3610745, globalFrozenXid=3513464, globalRecycleXid=3610743, newRecycleXid=3610745.
2025-01-03 15:18:03.024 omm postgres localhost 70384045436640 0[0:0#0] 0 [BACKEND] LOG: clean statement thread start
2025-01-03 15:18:06.121 [unknown] [unknown] localhost 70374483930848 0[0:0#0] 0 [UNDO] LOG: [UndoRecycleMain:988]update globalRecycleXid: oldestXmin=3610747, recycleXmin=3610747, globalFrozenXid=3513464, globalRecycleXid=3610745, newRecycleXid=3610747.
2025-01-03 15:18:18.186 [unknown] [unknown] localhost 70374483930848 0[0:0#0] 0 [UNDO] LOG: [UndoRecycleMain:988]update globalRecycleXid: oldestXmin=3610751, recycleXmin=3610751, globalFrozenXid=3513464, globalRecycleXid=3610747, newRecycleXid=3610751.
2025-01-03 15:18:24.219 [unknown] [unknown] localhost 70374483930848 0[0:0#0] 0 [BACKEND] WARNING: [UndoRecycleMain:979]curr xid having undo 3610721 < global globalRecycleXid 3610751.
2025-01-03 15:18:24.223 [unknown] [unknown] localhost 70374483930848 0[0:0#0] 0 [UNDO] PANIC: [VerifyRecycleXidAdvance:153]Advance recycle xid failed, oldestRecycleXid 3610721 is smaller than globalRecycleXid 3610751.
2025-01-03 15:18:24.223 [unknown] [unknown] localhost 70374483930848 0[0:0#0] 0 [UNDO] BACKTRACELOG: tid[3269702]'s backtrace:
/opt/opengauss/bin/gaussdb(+0xd8a878) [0xaaaacae9a878]
/opt/opengauss/bin/gaussdb(_Z9errfinishiz+0x4a4) [0xaaaacae8d874]
/opt/opengauss/bin/gaussdb(_ZN4undo23VerifyRecycleXidAdvanceEmm11VerifyLevel+0x114) [0xaaaacb8fec34]
/opt/opengauss/bin/gaussdb(_ZN4undo15UndoRecycleMainEv+0x6c0) [0xaaaacb900930]
/opt/opengauss/bin/gaussdb(_Z17GaussDbThreadMainIL15knl_thread_role56EEiP14knl_thread_arg+0x3ec) [0xaaaacb3c50dc]
/opt/opengauss/bin/gaussdb(+0x128ef30) [0xaaaacb39ef30]
/usr/lib64/libc.so.6(+0x82508) [0x400028c02508]
/usr/lib64/libc.so.6(+0xe9cdc) [0x400028c69cdc]
Use addr2line to get pretty function name and line
从日志看应该是XID当前的事务 ID 小于全局回收事务 ID,Undo 信息未能正确回收,xid推进问题导致数据库崩溃,但是根本原因可能是什么呢