锁 单个SESSION锁等待队列超过20个 #应用#
一、数据库状态检查的步骤/命令(通过本步骤可以确定数据库当前的状态是否OK)
检查连接情况,确保能正常登录、有应用连接、连接数使用率正常
col PROCESS_LIMIT for a20; select process_current,value process_limit,round(process_current / value, 2) * 100 || '%' as process_pct from (select count(*) as process_current from v$process) a, v$parameter where name = 'processes';复制
二、请描述告警出现的可能原因
锁等待,可能是应用逻辑或者语句效率问题
三、预计多久不处理就可以出现UIOC(P1)
持续等待则影响业务语句正常执行,需15分钟内处理正常
四、告警检查的步骤(通过本步骤确定告警是否是误告)
--单个会话的锁等待队列长度
select /*+rule*/'lock_num:'||nvl(max(count(aa.holding_session)),0) lock_num from dba_waiters aa group by aa.holding_session;复制
五、需要保存现场的步骤/命令
create table dbmgr.ash_20220000 as select tt.* from gv$active_session_history tt; create table dbmgr.lock_20220000 as select *from gv$lock; create table dbmgr.locked_object_20220000 as select *from GV$LOCKED_OBJECT;复制
六、需要反馈上级以及运营的信息
将堵塞的情况反馈运营/DA,同时确认业务的影响情况,运营确认是否kill holder会话
七、建议处理的流程,步骤和对应命令
1、检查堵塞情况,找到holder,发给运营评估能否kill
a. set line 1000 pagesize 999 col EVENT for a40 select s.username, decode(lc.block, 0, 'waiter: ', 'holder: ') || lc.sid sid, decode(lc.lmode, 0, 'none', 1, 'null', 2, 'row share', 3, 'row excl.', 4, 'share', 5, 's/row excl.', 6, 'exclusive', lmode, ltrim(to_char(lc.lmode, '990'))) lmode, decode(lc.request, 0, 'none', 1, 'null', 2, 'row share', 3, 'row excl.', 4, 'share', 5, 's/row excl.', 6, 'exclusive', request, ltrim(to_char(lc.request, '990'))) request, lc.type, decode( s.sql_id,null,s.PREV_SQL_ID,s.sql_id) sql_id, s.EVENT,s.BLOCKING_SESSION_STATUS,s.BLOCKING_SESSION,s.BLOCKING_INSTANCE, 'alter system kill session ' || '''' || S.SID || ',' || S.SERIAL# || ',@' || s.INST_ID || ''''||' immediate;' kill_command from gv$lock lc, gv$session s where (lc.id1, lc.id2, lc.type) in (select t.id1, t.id2, t.type from gv$lock t where t.request > 0) and lc.sid = s.sid and lc.inst_id = s.inst_id order by lc.id1, lc.request;复制
b. with l as( select /*+materialize*/ decode(request,0,'Holder:',' Waiter:') role, l.inst_id,l.sid, id1,id2,lmode,request,l.type from gv$lock l where (id1,id2,l.type) in (select id1,id2,type from gv$lock where request>0 ) order by id1,request) select l.*, s.SQL_ID, s.PREV_SQL_ID from l, gv$session s where l.sid = s.sid ; with l as (select/*+materialize*/ sid from gv$lock where block = 1) select 'alter system kill session '''||s.sid||','||serial#||',@'||inst_id||''' immediate;' as sql from gv$session s, l where s.sid = l.sid order by s.sid,s.serial#;复制
2、反馈运营会话详情,评估能否kill
--查看sid对应会话详情
set linesize 150 pagesize 999 col EVENT for a30 col USERNAME for a10 col MACHINE for a10 col OSUSER for a10 col PROGRAM for a10 select inst_id,sid,serial#,sql_id,event,blocking_instance,blocking_session,blocking_session_status,final_blocking_instance,final_blocking_session,final_blocking_session_status,username,status,LAST_CALL_ET,machine,osuser,program,sql_hash_value,to_char(logon_time,'yyyy-mm-dd hh24:mi:ss'),prev_sql_id from gv$session where sid=&sid;复制
--查看sql_id对应具体语句sql_text
set linesize 150 pagesize 999 select a.sql_id,a.child_number,a.sql_text,a.parsing_schema_name,a.first_load_time,a.last_load_time,a.last_active_time,a.plan_hash_value,a.sql_plan_baseline from gv$sql a where sql_id='&sql_id' order by last_active_time desc;复制
3、杀会话(注意核对会话信息)
执行步骤1生成的kill session语句
八、验证问题已经恢复的步骤和命令
--单个会话的锁等待队列长度,恢复为0则无堵塞
select /*+rule*/'lock_num:'||nvl(max(count(aa.holding_session)),0) lock_num from dba_waiters aa group by aa.holding_session;复制
LOCK_NUM
-------------------------------------------------
lock_num:0
oracle行锁等待处理
1、接到告警,有几个SQL大量行锁等待
2、查询top event sql
set linesize 1000 select inst_id,sql_id,EVENT,sum(DELTA_TIME),count(1) from gv$active_session_history ash where ash.SAMPLE_TIME >= sysdate - 1/24 group by inst_id,sql_id,EVENT order by count(1) desc,sum(DELTA_TIME) desc OFFSET 0 ROWS FETCH NEXT 10 ROWS ONLY;复制
3、查SQL文本,确定被锁的表为 beneficiary_info
select
distinct sql_id,sql_text from gv$sqlarea t where t.sql_id in
('3cp53mvc1wvwj','brwrfn81as7nk');
4、查表锁情况及阻塞源
select sess.inst_id,sess.sid,sess.serial#, lo.oracle_username,lo.os_user_name, ao.object_name,lo.locked_mode, sess.prev_sql_id,sess.sql_id,sess.status,sess.LAST_CALL_ET,sess.LOGON_TIME,sess.event, sess.ROW_WAIT_OBJ#,sess.BLOCKING_SESSION,sess.FINAL_BLOCKING_SESSION,sess.BLOCKING_INSTANCE, (select s.sql_text from gv$sql s where s.sql_id=sess.sql_id and rownum=1) sql_text, (select s.sql_text from gv$sql s where s.sql_id=sess.prev_sql_id and rownum=1) sql_text1 from gv$locked_object lo,dba_objects ao,gv$session sess where ao.object_id=lo.object_id and lo.session_id=sess.sid and lo.inst_id = sess.inst_id and ao.object_name=upper('beneficiary_info');复制
发现有大量阻塞,且没有明确的阻塞源
5、sys登录数据库做hang分析
SYS@lucd01> oradebug setmypid Statement processed. SYS@lucd01> oradebug -g all hanganalyze 3 Hang Analysis in /paic/app/oracle/rdbms/diag/rdbms/lucd0/lucd01/trace/lucd01_diag_409518.trc SYS@lucd01> exit Disconnected from Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production Version 19.5.0.0.0复制
6、快速截取hang分析关键信息
cnsz083184:lucd01 > grep -n "HANG ANALYSIS:" /paic/app/oracle/rdbms/diag/rdbms/lucd0/lucd01/trace/lucd01_diag_409518.trc
2445:HANG ANALYSIS:
cnsz083184:lucd01 > vi /paic/app/oracle/rdbms/diag/rdbms/lucd0/lucd01/trace/lucd01_diag_409518.trc
vi后esc :2440 跳转到相应行
从HANG ANALYSIS:开始截取,截取到主要阻塞,重点关注 Chains
most likely to have caused the hang 内容
*** 2022-07-28T14:39:38.300811+08:00 =============================================================================== HANG ANALYSIS: instances (db_name.oracle_sid): lucd0.lucd01, lucd0.lucd02, lucd0.lucd03 oradebug_node_dump_level: 3 analysis initiated by oradebug os thread scheduling delay history: (sampling every 1.000000 secs) 0.000000 secs at [ 14:39:37 ] NOTE: scheduling delay has not been sampled for 0.630678 secs 0.000000 secs from [ 14:39:33 - 14:39:38 ], 5 sec avg 0.000000 secs from [ 14:38:38 - 14:39:38 ], 1 min avg 0.000000 secs from [ 14:34:38 - 14:39:38 ], 5 min avg vktm time drift history =============================================================================== Chains most likely to have caused the hang: [a] Chain 1 Signature: 'SQL*Net message from client'<='enq: TX - row lock contention' Chain 1 Signature Hash: 0x38c48850 [b] Chain 2 Signature: 'SQL*Net message from client'<='enq: TX - row lock contention' Chain 2 Signature Hash: 0x38c48850 [c] Chain 3 Signature: 'SQL*Net message from client'<='enq: TX - row lock contention' Chain 3 Signature Hash: 0x38c48850 =============================================================================== Non-intersecting chains: ------------------------------------------------------------------------------- Chain 1: ------------------------------------------------------------------------------- Oracle session identified by: { instance: 1 (lucd0.lucd01) os id: 377755 process id: 1783, oracle@cnsz083184 session id: 1615 session serial #: 14965 module name: 0 (JDBC Thin Client85 (TNS V1-V3)3)SH046036 (TNS V1) } is waiting for 'enq: TX - row lock contention' with wait info: { p1: 'name|mode'=0x54580006 p2: 'usn<<16 | slot'=0x15d000b p3: 'sequence'=0x2f6fd8 px1: 'pdb uid'=0x0 px2: 'master hint'=0x2 time in wait: 1741 min 15 sec timeout after: never wait id: 7894 blocking: 0 sessions current sql_id: 911516574 current sql: update beneficiary_info set applicant_relation = :1 , applicant_relation_desc = :2 where polno = :3 and ben_type = :4 and benef_seq = :5 and benef_sts = 1 short stack: ksedsts()+783<-ksdxfstk()+58<-ksdxcb()+872<-sspuser()+200<-__sighandler()<-semtimedop()+10<-skgpwwait()+187<-ksliwat()+2179<-kslwaitctx()+192<-kjusuc()+4698<-ksipgetctxia()+1968<-ksqcmi()+2757<-ksqgtlctx()+6620<-ksqgelctx()+838<-ktuGetTxForXid()+324<-ktcwit1()+367<-kdddgb()+5478<-kdusru()+458<-kauupd()+356<-updrow()+1699<-qerupUpdRow()+725<-qerupFetch()+644<-updaul()+1416<-updThreePhaseExe()+340<-updexe()+443<-opiexe()+11815<-kpoal8()+2387<-opiodr()+1202<-ttcpip()+1222<-opitsk()+1903<-opiino()+936<-opiodr()+1202<- wait history: * time between current wait and wait #1: 0.000043 sec 1. event: 'gc current block busy' time waited: 0.000165 sec wait id: 7893 p1: ''=0x8ee p2: ''=0xe40d6 p3: ''=0x2000001 * time between wait #1 and #2: 0.000182 sec 2. event: 'gc cr block 2-way' time waited: 0.000265 sec wait id: 7892 p1: ''=0x8ee p2: ''=0xe40d6 p3: ''=0x1 * time between wait #2 and #3: 0.000314 sec 3. event: 'SQL*Net message from client' time waited: 9.914156 sec wait id: 7891 p1: 'driver id'=0x54435000 p2: '#bytes'=0x1 } and is blocked by => Oracle session identified by: { instance: 1 (lucd0.lucd01) os id: 61727 process id: 1165, oracle@cnsz083184 session id: 7528 session serial #: 29673 module name: 0 (JDBC Thin Client85 (TNS V1-V3)))dg-79b9f9b7dd-sg) } which is waiting for 'SQL*Net message from client' with wait info: { p1: 'driver id'=0x54435000 p2: '#bytes'=0x1 time in wait: 2.929703 sec timeout after: never wait id: 151340 blocking: 16 sessions current sql_id: 0 current sql: <none> short stack: ksedsts()+783<-ksdxfstk()+58<-ksdxcb()+872<-sspuser()+200<-__sighandler()<-read()+14<-snttread()+16<-nttfprd()+354<-nsbasic_brc()+399<-nioqrc()+438<-opikndf2()+999<-opitsk()+913<-opiino()+936<-opiodr()+1202<-opidrv()+1094<-sou2o()+165<-opimai_real()+422<-ssthrdmain()+417<-main()+256<-__libc_start_main()+245 wait history: * time between current wait and wait #1: 0.000034 sec 1. event: 'SQL*Net message to client' time waited: 0.000002 sec wait id: 151339 p1: 'driver id'=0x54435000 p2: '#bytes'=0x1 * time between wait #1 and #2: 0.000006 sec 2. event: 'SQL*Net break/reset to client' time waited: 0.001725 sec wait id: 151338 p1: 'driver id'=0x54435000 p2: 'break?'=0x0 * time between wait #2 and #3: 0.000001 sec 3. event: 'SQL*Net break/reset to client' time waited: 0.000004 sec wait id: 151337 p1: 'driver id'=0x54435000 p2: 'break?'=0x1 } Chain 1 Signature: 'SQL*Net message from client'<='enq: TX - row lock contention' Chain 1 Signature Hash: 0x38c48850 ------------------------------------------------------------------------------- ------------------------------------------------------------------------------- Chain 2: ------------------------------------------------------------------------------- Oracle session identified by: { instance: 1 (lucd0.lucd01) os id: 423707 process id: 3271, oracle@cnsz083184 session id: 7120 session serial #: 1978 module name: 0 (JDBC Thin Client(TNS V1-V3)V3)3)dg-79b9f9b7dd-rz) } is waiting for 'enq: TX - row lock contention' with wait info: { p1: 'name|mode'=0x54580006 p2: 'usn<<16 | slot'=0x600013 p3: 'sequence'=0x1fad46 px1: 'pdb uid'=0x0 px2: 'master hint'=0x2 time in wait: 1046 min 56 sec timeout after: never wait id: 13477 blocking: 0 sessions current sql_id: 3153260194 current sql: update beneficiary_info i set i.sex = :1 , i.country = :2 , i.id_type = :3 , i.idno_validity_date = :4 , i.prof_code = :5 , i.ben_addr = :6 , i.ben_tel = :7 where i.polno=:8 and i.clientno= short stack: ksedsts()+783<-ksdxfstk()+58<-ksdxcb()+872<-sspuser()+200<-__sighandler()<-semtimedop()+10<-skgpwwait()+187<-ksliwat()+2179<-kslwaitctx()+192<-kjusuc()+4698<-ksipgetctxia()+1968<-ksqcmi()+2757<-ksqgtlctx()+6620<-ksqgelctx()+838<-ktuGetTxForXid()+324<-ktcwit1()+367<-kdddgb()+5478<-kdusru()+458<-kauupd()+356<-updrow()+1699<-qerupUpdRow()+725<-qerupFetch()+644<-updaul()+1416<-updThreePhaseExe()+340<-updexe()+443<-opiexe()+11815<-kpoal8()+2387<-opiodr()+1202<-ttcpip()+1222<-opitsk()+1903<-opiino()+936<-opiodr()+1202<- wait history: * time between current wait and wait #1: 0.000331 sec 1. event: 'SQL*Net message from client' time waited: 0.002083 sec wait id: 13476 p1: 'driver id'=0x54435000 p2: '#bytes'=0x1 * time between wait #1 and #2: 0.000157 sec 2. event: 'SQL*Net message to client' time waited: 0.000002 sec wait id: 13475 p1: 'driver id'=0x54435000 p2: '#bytes'=0x1 * time between wait #2 and #3: 0.000196 sec 3. event: 'SQL*Net message from client' time waited: 8.129646 sec wait id: 13474 p1: 'driver id'=0x54435000 p2: '#bytes'=0x1 } and is blocked by => Oracle session identified by: { instance: 1 (lucd0.lucd01) os id: 117934 process id: 827, oracle@cnsz083184 session id: 8504 session serial #: 40824 module name: 0 (JDBC Thin Client84 (TNS V1-V3)3)g-79b9f9b7dd-hhp) } which is waiting for 'SQL*Net message from client' with wait info: { p1: 'driver id'=0x54435000 p2: '#bytes'=0x1 time in wait: 6.853693 sec timeout after: never wait id: 172510 blocking: 2 sessions current sql_id: 0 current sql: <none> short stack: ksedsts()+783<-ksdxfstk()+58<-ksdxcb()+872<-sspuser()+200<-__sighandler()<-read()+14<-snttread()+16<-nttfprd()+354<-nsbasic_brc()+399<-nioqrc()+438<-opikndf2()+999<-opitsk()+913<-opiino()+936<-opiodr()+1202<-opidrv()+1094<-sou2o()+165<-opimai_real()+422<-ssthrdmain()+417<-main()+256<-__libc_start_main()+245 wait history: * time between current wait and wait #1: 0.000007 sec 1. event: 'SQL*Net message to client' time waited: 0.000001 sec wait id: 172509 p1: 'driver id'=0x54435000 p2: '#bytes'=0x1 * time between wait #1 and #2: 0.000106 sec 2. event: 'db file sequential read' time waited: 0.000512 sec wait id: 172508 p1: 'file#'=0x5d1 p2: 'block#'=0x219c5b p3: 'blocks'=0x1 * time between wait #2 and #3: 0.000662 sec 3. event: 'SQL*Net message from client' time waited: 0.265024 sec wait id: 172507 p1: 'driver id'=0x54435000 p2: '#bytes'=0x1 } Chain 2 Signature: 'SQL*Net message from client'<='enq: TX - row lock contention' Chain 2 Signature Hash: 0x38c48850 ------------------------------------------------------------------------------- ------------------------------------------------------------------------------- Chain 3: ------------------------------------------------------------------------------- Oracle session identified by: { instance: 1 (lucd0.lucd01) os id: 338848 process id: 2047, oracle@cnsz083184 session id: 7675 session serial #: 39507 module name: 0 (JDBC Thin Client85 (TNS V1-V3)3)dg-79b9f9b7dd-dk) } is waiting for 'enq: TX - row lock contention' with wait info: { p1: 'name|mode'=0x54580006 p2: 'usn<<16 | slot'=0xcd0011 p3: 'sequence'=0x2e8cea px1: 'pdb uid'=0x0 px2: 'master hint'=0x5 time in wait: 140 min 13 sec timeout after: never wait id: 45070 blocking: 0 sessions current sql_id: 3153260194 current sql: update beneficiary_info i set i.sex = :1 , i.country = :2 , i.id_type = :3 , i.idno_validity_date = :4 , i.prof_code = :5 , i.ben_addr = :6 , i.ben_tel = :7 where i.polno=:8 and i.clientno= short stack: ksedsts()+783<-ksdxfstk()+58<-ksdxcb()+872<-sspuser()+200<-__sighandler()<-semtimedop()+10<-skgpwwait()+187<-ksliwat()+2179<-kslwaitctx()+192<-kjusuc()+4698<-ksipgetctxia()+1968<-ksqcmi()+2757<-ksqgtlctx()+6620<-ksqgelctx()+838<-ktuGetTxForXid()+324<-ktcwit1()+367<-kdddgb()+5478<-kdusru()+458<-kauupd()+356<-updrow()+1699<-qerupUpdRow()+725<-qerupFetch()+644<-updaul()+1416<-updThreePhaseExe()+340<-updexe()+443<-opiexe()+11815<-kpoal8()+2387<-opiodr()+1202<-ttcpip()+1222<-opitsk()+1903<-opiino()+936<-opiodr()+1202<- wait history: * time between current wait and wait #1: 0.000044 sec 1. event: 'gc current block 2-way' time waited: 0.000102 sec wait id: 45069 p1: ''=0x5 p2: ''=0x1b0 p3: ''=0x1a9 * time between wait #1 and #2: 0.000041 sec 2. event: 'Disk file operations I/O' time waited: 0.000007 sec wait id: 45068 p1: 'FileOperation'=0x2 p2: 'fileno'=0x5 p3: 'filetype'=0x2 * time between wait #2 and #3: 0.000191 sec 3. event: 'gc current block busy' time waited: 0.000161 sec wait id: 45067 p1: ''=0x8cc p2: ''=0x34607c p3: ''=0x2000001 } and is blocked by => Oracle session identified by: { instance: 3 (lucd0.lucd03) os id: 240730 process id: 3294, oracle@cnsz083186 session id: 8722 session serial #: 48990 module name: 0 (JDBC Thin Clientconnection-prd-dg-79b9f9b7dd-sp8) } which is waiting for 'SQL*Net message from client' with wait info: { p1: 'driver id'=0x54435000 p2: '#bytes'=0x1 time in wait: 0.763401 sec timeout after: never wait id: 121252 blocking: 7 sessions current sql_id: 0 current sql: <none> short stack: ksedsts()+783<-ksdxfstk()+58<-ksdxcb()+872<-sspuser()+200<-__sighandler()<-read()+14<-snttread()+16<-nttfprd()+354<-nsbasic_brc()+399<-nioqrc()+438<-opikndf2()+999<-opitsk()+913<-opiino()+936<-opiodr()+1202<-opidrv()+1094<-sou2o()+165<-opimai_real()+422<-ssthrdmain()+417<-main()+256<-__libc_start_main()+245 wait history: * time between current wait and wait #1: 0.000006 sec 1. event: 'SQL*Net message to client' time waited: 0.000001 sec wait id: 121251 p1: 'driver id'=0x54435000 p2: '#bytes'=0x1 * time between wait #1 and #2: 0.000086 sec 2. event: 'db file sequential read' time waited: 0.000711 sec wait id: 121250 p1: 'file#'=0x5e6 p2: 'block#'=0x1b1c5d p3: 'blocks'=0x1 * time between wait #2 and #3: 0.000007 sec 3. event: 'gc current grant busy' time waited: 0.000147 sec wait id: 121249 p1: ''=0x5e6 p2: ''=0x1b1c5d p3: ''=0x2000001 } Chain 3 Signature: 'SQL*Net message from client'<='enq: TX - row lock contention' Chain 3 Signature Hash: 0x38c48850复制
7、分析阻塞源
从上面的信息中找到 is blocked by,提取里面的session信息,就是阻塞源
同时结合锁表情况,发现阻塞源是这4个INACTIVE的session
8、联系运营kill掉阻塞源
alter system kill session '&sid,&serial#,@&inst_id';
再次查询锁表情况,看效果