巡检发现某生产库 TOP 1等待事件为enq: CR - block range reuse ckpt,经分析定位为UNDO STEAL导致,这里主要测试重现ORACLE STEAL UNDO EXTENT及enq: CR - block range reuse ckpt等待事件产生的过程。
模拟过程
1、创建表空间
SQL> create undo tablespace undo2 datafile '/oracle/app/oracle/oradata/NOCDB19C/undo2.dbf' size 20m;
Tablespace created.
SQL> alter system set undo_tablespace=undo2;
System altered.
SQL> select RETENTION from dba_tablespaces where TABLESPACE_NAME='UNDO2';
RETENTION
-----------
NOGUARANTEE
SQL> @p undo_retention
NAME VALUE
---------------------------------------- ----------------------------------------
undo_retention 900
2、登录数据库创建测试表,查找对应SERVER SPID,GDB设置断点
create table dbmt.tobj as select * from dba_objects;
在ORACLE UNDO STEAL相关函数上设置断点
(gdb) break ktusm_stealext
Breakpoint 2 at 0x1c48910
(gdb) c
3、更新dbmt.tobj表,产生UNDO占用
72634 rows updated.
SQL> @trans
SID SERIAL# USERNAME TADDR SES_ADDR USED_UBLK USED_UREC 0xFLAG STATUS START_DATE XIDUSN XIDSLOT XIDSQN XID PRV_XID PTX_XID
---------- ---------- --------------------------- ---------------- ---------- ---------- --------- ----------------------------- ------------------ ---------- ---------- ---------- ---------------- ---------------- ----------------
1 49117 SYS 0000000073733698 000000007856C6F0 1026 76885 E03 ACTIVE 30-NOV-21 11 15 5 0B000F0005000000 0000000000000000 0000000000000000
commit;SQL> @df
TABLESPACE_NAME TotalMB UsedMB FreeMB % Used Ext Used
------------------------------ ---------- ---------- ---------- ------ --- ----------------------
ILMTSDEMO_LOWCOST 50 9 41 18% YES |#### |
ILM_SOURCE 12 1 11 9% YES |## |
SYSAUX 700 661 39 95% YES |################### |
SYSTEM 980 971 9 100% YES |####################|
TEMP 111 111 0 100% YES |####################|
UNDO2 20 12 8 60% NO |############ | <<<<
UNDOTBS1 645 300 345 47% YES |########## |
USERS 43 40 3 94% YES |################### |
col segment_name for a30select b.us#,segment_name,round(bytes/1024/1024,2)mb,b.STATUS$ from dba_segments a,undo$ b where tablespace_name='UNDO2' and a.segment_name=b.name;SQL> SQL>
US# SEGMENT_NAME MB STATUS$
---------- ------------------------------ ---------- ----------
11 _SYSSMU11_218934645$ 9 3 <<<<
12 _SYSSMU12_1212566812$ .13 3
13 _SYSSMU13_1508681309$ .13 3
14 _SYSSMU14_1297497020$ .13 3
15 _SYSSMU15_3396771108$ .13 3
16 _SYSSMU16_1123823253$ .13 3
17 _SYSSMU17_1247462769$ .13 3
18 _SYSSMU18_74029735$ .13 3
19 _SYSSMU19_3285312421$ .13 3
20 _SYSSMU20_2211160347$ .13 3
10 rows selected.
4、再次更新dbmt.tobj表,一次少更新几行,确定事务未分配到刚刚使用的UNDO SEGMENT上
98 rows updated.
SQL>
SQL> @trans
SID SERIAL# USERNAME TADDR SES_ADDR USED_UBLK USED_UREC 0xFLAG STATUS START_DATE XIDUSN XIDSLOT XIDSQN XID PRV_XID PTX_XID
---------- ---------- ------------ ---------------- ---------------- ---------- ---------- --------- ----------------------------- ------------------ ---------- ---------- ---------- ---------------- ---------------- ----------------
1 49117 SYS 0000000073733698 000000007856C6F0 1 6 E03 ACTIVE 30-NOV-21 19 16 5 1300100005000000 0000000000000000 0000000000000000
XIDUSN 19
col segment_name for a30
select b.us#,segment_name,round(bytes/1024/1024,2)mb,b.STATUS$ from dba_segments a,undo$ b where tablespace_name='UNDO2' and a.segment_name=b.name;
SQL>
US# SEGMENT_NAME MB STATUS$
---------- ------------------------------ ---------- ----------
11 _SYSSMU11_218934645$ 9 3
12 _SYSSMU12_1212566812$ .13 3
13 _SYSSMU13_1508681309$ .13 3
14 _SYSSMU14_1297497020$ .13 3
15 _SYSSMU15_3396771108$ .13 3
16 _SYSSMU16_1123823253$ .13 3
17 _SYSSMU17_1247462769$ .13 3
18 _SYSSMU18_74029735$ .13 3
19 _SYSSMU19_3285312421$ .13 3 <<<<
20 _SYSSMU20_2211160347$ .13 3
10 rows selected.
5、GDB 继续调试,观察UNDO SEGMENT会产生STEAL,但未观察到活动会话中的等待事务
SQL> update dbmt.tobj set object_name=object_name where object_id<100000;
SQL> @df
TABLESPACE_NAME TotalMB UsedMB FreeMB % Used Ext Used
------------------------------ ---------- ---------- ---------- ------ --- ----------------------
ILMTSDEMO_LOWCOST 50 9 41 18% YES |#### |
ILM_SOURCE 12 1 11 9% YES |## |
SYSAUX 700 661 39 95% YES |################### |
SYSTEM 980 971 9 100% YES |####################|
TEMP 111 111 0 100% YES |####################|
UNDO2 20 20 0 100% NO |####################|
UNDOTBS1 645 300 345 47% YES |########## |
USERS 43 40 3 94% YES |################### |
8 rows selected.
US# SEGMENT_NAME MB STATUS$
---------- ------------------------------ ---------- ----------
11 _SYSSMU11_218934645$ 9 3
12 _SYSSMU12_1212566812$ .13 3
13 _SYSSMU13_1508681309$ .13 3
14 _SYSSMU14_1297497020$ .13 3
15 _SYSSMU15_3396771108$ .13 3
16 _SYSSMU16_1123823253$ .13 3
17 _SYSSMU17_1247462769$ .13 3
18 _SYSSMU18_74029735$ .13 3
19 _SYSSMU19_3285312421$ 9 3
20 _SYSSMU20_2211160347$ .13 3
10 rows selected.
USERNAME SID EVENT MACHINE MODULE STATUS LAST_ET SQL_DT SQL_ID PLOBJ WAI_SECINW ROW_OBJ SQLTEXT BS OSUSER
----------- ------ -------------------- ---------- -------------------- -------- ------- ---------- --------------- -------- ---------- -------- ------------------------------ ---------- ----------
SYS 1 On CPU / runqueue rac1 SQL*Plus ACTIVE 128 128 7gn43xth7dhkz 3855:166 -1 update dbmt.tobj set object_n : oracle
Breakpoint 2, 0x0000000001c48910 in ktusm_stealext ()
(gdb) bt
#0 0x0000000001c48910 in ktusm_stealext ()
#1 0x0000000001c461cf in ktusmasp1r ()
#2 0x0000000001c4fc27 in ktrsexecExecuteInExcepHdlr ()
#3 0x0000000001c4ece1 in ktrsexec ()
#4 0x0000000001bf38db in ktuaspResumableInPdb ()
#5 0x0000000001c45505 in ktusmasp ()
#6 0x00000000126e5bbf in ktuchg2 ()
#7 0x00000000127177e7 in ktbchg2 ()
#8 0x00000000039d4ad1 in kdu_array_flush_retry ()
#9 0x00000000039d407c in kdu_array_flush1 ()
#10 0x0000000012b60510 in qerupUpdRow ()
#11 0x0000000012b5f023 in qerupRopRowsets ()
#12 0x0000000003b3897f in kdstf110110100001000km ()
#13 0x0000000012ac19f7 in kdsttgr ()
#14 0x0000000012b43054 in qertbFetch ()
#15 0x0000000012b5f378 in qerupFetch ()
#16 0x0000000012a053c8 in updaul ()
#17 0x0000000012a041a5 in updThreePhaseExe ()
#18 0x0000000012a0176b in updexe ()
#19 0x00000000129267fd in opiexe ()
#20 0x0000000012991a52 in kpoal8 ()
#21 0x000000001291fe22 in opiodr ()
#22 0x0000000012c849e7 in ttcpip ()
#23 0x000000000293e9c9 in opitsk ()
#24 0x00000000029432c8 in opiino ()
#25 0x000000001291fe22 in opiodr ()
#26 0x000000000293a746 in opidrv ()
#27 0x0000000003507375 in sou2o ()
#28 0x0000000000dbe406 in opimai_real ()
#29 0x0000000003513421 in ssthrdmain ()
#30 0x0000000000dbe230 in main ()
(gdb)
6、GDB 继续调试,观察UNDO SEGMENT会产生STEAL
(gdb) n
Single stepping until exit from function ktusm_stealext,
which has no line number information.
0x0000000001c461cf in ktusmasp1r ()
(gdb) n
Single stepping until exit from function ktusmasp1r,
which has no line number information.
Breakpoint 1, 0x0000000001e77a90 in kcbrbr_int ()
(gdb) n
Single stepping until exit from function kcbrbr_int,
which has no line number information.
0x0000000001e77a6f in kcbrbr ()
(gdb) n
Single stepping until exit from function kcbrbr,
which has no line number information.
0x000000000202f51d in kteopdelete1 ()
(gdb) n
Single stepping until exit from function kteopdelete1,
which has no line number information.
0x000000000208e469 in ktsxr_delete ()
(gdb) n
Single stepping until exit from function ktsxr_delete,
which has no line number information.
0x0000000001c146b6 in ktusp_delextent ()
(gdb) n
Single stepping until exit from function ktusp_delextent,
which has no line number information.
0x0000000001c12daa in ktuShrinkUndoSeg ()
(gdb) n
Single stepping until exit from function ktuShrinkUndoSeg,
which has no line number information.
0x0000000001c48ec4 in ktusmrsn ()
(gdb) n
Single stepping until exit from function ktusmrsn,
which has no line number information.
0x0000000001c470c0 in ktusmasp1r ()
(gdb) bt
#0 0x0000000001c470c0 in ktusmasp1r ()
#1 0x0000000001c4fc27 in ktrsexecExecuteInExcepHdlr ()
#2 0x0000000001c4ece1 in ktrsexec ()
#3 0x0000000001bf38db in ktuaspResumableInPdb ()
#4 0x0000000001c45505 in ktusmasp ()
#5 0x00000000126e5bbf in ktuchg2 ()
#6 0x00000000127177e7 in ktbchg2 ()
#7 0x00000000039d4ad1 in kdu_array_flush_retry ()
#8 0x00000000039d407c in kdu_array_flush1 ()
#9 0x0000000012b60510 in qerupUpdRow ()
#10 0x0000000012b5f023 in qerupRopRowsets ()
#11 0x0000000003b3897f in kdstf110110100001000km ()
#12 0x0000000012ac19f7 in kdsttgr ()
#13 0x0000000012b43054 in qertbFetch ()
#14 0x0000000012b5f378 in qerupFetch ()
#15 0x0000000012a053c8 in updaul ()
#16 0x0000000012a041a5 in updThreePhaseExe ()
#17 0x0000000012a0176b in updexe ()
#18 0x00000000129267fd in opiexe ()
#19 0x0000000012991a52 in kpoal8 ()
#20 0x000000001291fe22 in opiodr ()
#21 0x0000000012c849e7 in ttcpip ()
#22 0x000000000293e9c9 in opitsk ()
#23 0x00000000029432c8 in opiino ()
#24 0x000000001291fe22 in opiodr ()
#25 0x000000000293a746 in opidrv ()
#26 0x0000000003507375 in sou2o ()
#27 0x0000000000dbe406 in opimai_real ()
#28 0x0000000003513421 in ssthrdmain ()
#29 0x0000000000dbe230 in main ()
US# SEGMENT_NAME MB STATUS$
---------- ------------------------------ ---------- ----------
11 _SYSSMU11_218934645$ 8.88 3 <<<<被偷窃,从9M缩小到8.88
12 _SYSSMU12_1212566812$ .13 3
13 _SYSSMU13_1508681309$ .13 3
14 _SYSSMU14_1297497020$ .13 3
15 _SYSSMU15_3396771108$ .13 3
16 _SYSSMU16_1123823253$ .13 3
17 _SYSSMU17_1247462769$ .13 3
18 _SYSSMU18_74029735$ .13 3
19 _SYSSMU19_3285312421$ 9.13 3<<<<活动事务使用,增大
20 _SYSSMU20_2211160347$ .13 3
10 rows selected.
设置10046 重复上面的过程,可以观察到enq: CR - block range reuse ckpt
前面通过GDB加断点测试,捕捉到了ktusm_stealext函数调用,但观察不到等待enq: CR - block range reuse ckpt,猜想这个等待是毫秒级的,如果加断点的位置并不是加wait event之后,解除之前应该是看不到的,通过10026再观察。
先尝试STEAL,如果不成功,才能使用FREE EXTENT
通过trace 观察到了enq: CR - block range reuse ckpt
nocdb19c_ora_2279.trc
行 558: WAIT #139934742178240: nam='enq: CR - block range reuse ckpt' ela= 1446 name|mode=1129447430 2=65570 0=1 obj#=-1 tim=1126482933
行 581: WAIT #139934742178240: nam='enq: CR - block range reuse ckpt' ela= 597 name|mode=1129447430 2=65570 0=1 obj#=-1 tim=1126486780
行 604: WAIT #139934742178240: nam='enq: CR - block range reuse ckpt' ela= 1652 name|mode=1129447430 2=65570 0=1 obj#=-1 tim=1126491195
并且在等待enq: CR - block range reuse ckpt之前的等待事件为reliable message
WAIT #139934742178240: nam='reliable message' ela= 301 channel context=1989491688 channel handle=1989402640 broadcast message=1991183576 obj#=-1 tim=1126481460
WAIT #139934742178240: nam='enq: CR - block range reuse ckpt' ela= 1446 name|mode=1129447430 2=65570 0=1 obj#=-1 tim=1126482933
等待是发生成update执行阶段,并非解析SQL阶段
PARSING IN CURSOR #139934742178240 len=65 dep=0 uid=0 oct=6 lid=0 tim=1126068945 hv=149629389 ad='69cba1c0' sqlid='cucft6n4fqafd'
update dbmt.tobj2 set object_name=object_name where object_id>100
等待在以下递归SQL之后,看起来就是递归UNDO相关操作。
PARSING IN CURSOR #139934736487328 len=85 dep=1 uid=0 oct=3 lid=0 tim=1126297347 hv=3573688918 ad='69475af8' sqlid='86708bvah4akq'
select name from undo$ where file#=:1 and block#=:2 and ts#=:3 and status$ != 1