今天某交管客户出现登录异常,应用反馈无法登录数据库。数据库版本为11.2.0.4,patch版本没注意,先暂时不关注。使用客户提供的异常用户测试登录发现确实会hang住。
SQL> select * from v$version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
PL/SQL Release 11.2.0.4.0 - Production
CORE 11.2.0.4.0 Production
TNS for Linux: Version 11.2.0.4.0 - Production
NLSRTL Version 11.2.0.4.0 - Production
SQL*Plus: Release 11.2.0.4.0 Production on Wed Oct 27 14:48:25 2021
Copyright (c) 1982, 2013, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
SQL> conn centernew/XXXXXX
。。。hang
当时等待事件为:
INST_ID EVENT COUNT(*)
---------- ---------------------------------------------------------------- ----------
1 row cache lock 48
1 library cache lock 1
1 PX Deq: Execute Reply 1
1 PX Deq: Execution Msg 1
1 cursor: pin S wait on X 2
1 library cache load lock 5
2 row cache lock 48
2 PX Deq: Execution Msg 1
2 cursor: pin S wait on X 6
2 library cache load lock 5
当看到row cache lock和library cache lock一般来说需要分析是row cache的哪个组件发生争用,library cache lock的namespace等等。
进一步分析:
USERNAME EVENT P1 FINAL_BLOCKING_INSTANCE FINAL_BLOCKING_SESSION COUNT(*)
-------------- ------------------------ ---------- ----------------------- ---------------------- ----------
row cache lock 10 19
row cache lock 10 2 2838 23
CENTERNEW row cache lock 10 29
CENTERNEW row cache lock 10 2 2838 26
从11.2之后如果是密码错误导致的密码延迟一般产生的是library cache lock,这里发现有很多session username是空的,这就说明是那些登录异常的session,都是等待row cache lock,通过p1可以判断是dc_users组件。最终阻塞会话为2838,但是尝试去kill 2838时发现并没有解决问题,只是换了一个最终阻塞会话。原因在于row cache lock是一个enqueue,虽然没有TX那么多mode,它只有S(共享)和X(排他),其中S与S兼容,S阻塞X,X阻塞S,X阻塞X。如果这时有一个X模式的请求进入到enqueue中并且没有释放的话,那么会阻塞后续所有S和X的请求。
由于急于恢复业务,kill掉所有local=no的session。但仍然可以通过diag生成的systatedump去分析。
从row cache enqueue state object中可以看到有dc_users的X模式的请求,:
SO: 0x2f863e8230, type: 56, owner: 0x2fe20af948, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0x2fc14626f0, name=transaction, file=ktccts.h LINE:410, pg=0
(trans) flg = 0x00200001, flg2 = 0x004c0000, flg3 = 0x00000000, prx = (nil), ros = 2147483647, crtses=0x2fe20af948
flg = 0x00200001: ALC DDL
flg2 = 0x004c0000: PGA NIP DDID
flg3 = 0x00000000:
bsn = 0x54d bndsn = 0x54d spn = 0x5a0
efd = 22 rfd = 0 DID:
file:opiprs.c lineno:3136
parent xid: 0x0000.000.00000000
env [0x2f863e8648]: (scn: 0x0000.00000000 xid: 0x0000.000.00000000 uba: 0x00000000.0000.00 statement num=0 parent xid: 0x0000.000.00000000 st-scn: 0x0000.00000000 hi-scn: 0x0000.00000000 ma-scn: 0x0000.00000000 flg: 0x00000000)
cev: (spc = 0 arsp = (nil) ubkds (ubk:tsn: 0 rdba: 0x00000000 flag:0x0 hdl:(nil) addr:(nil)) useg tsn: 0 rdba: 0x00000000
hwm uba: 0x00000000.0000.00 col uba: 0x00000000.0000.00
num bl: 0 bk list: 0x0)
cr opc: 0x0 spc: 0 uba: 0x00000000.0000.00
Begin scn:0x0000.00000000 uba:0x00000000.0000.00 ts:1635318100[10/27/2021 15:01:40]
Undo blks: 0 recs: 0
ccbstg: 0x00000000
(enqueue) released enqueue or enqueue in flux
lock_flag: 0x0, lock: 0x2f863e82a8, res: 0x2fe25dd508
own: 0x2fe20af948, sess: 0x2fe20af948, prv: 0x2f863e82b8
xga: (nil), heap: UGA
tsnl:0x2db343b400 nent:0 nxt:(nil)
Trans IMU st: 0 Pool index 65535, Redo pool 0x2f863e89f0, Undo pool 0x2f863e8ad8
Redo pool range [0x7f70cc3a7b18 0x7f70cc3a7b18 0x7f70cc3aa318]
Undo pool range [0x7f70cc3a5318 0x7f70cc3a5318 0x7f70cc3a7b18]
chnf control flags 0x0 CHNF hwm uba uba: 0x00000000.0000.00 ----------------------------------------
SO: 0x2dcb5b3c48, type: 75, owner: 0x2f863e8230, flag: INIT/-/-/0x00 if: 0x1 c: 0x1
proc=0x2fc14626f0, name=row cache enqueues, file=kqr.h LINE:2075, pg=0
row cache enqueue: count=1 session=0x2fe20af948 object=0x2da7571770, request=X
savepoint=0x5a0
row cache parent object: address=0x2da7571770 cid=10(dc_users)
hash=59fb0f49 typ=21 transaction=(nil) flags=00000002
own=0x2da7571838[0x2da7571838,0x2da7571838] wat=0x2da7571848[0x2dcb5b3cf0,0x2dd6857b50] mode=S
status=VALID/-/-/-/-/-/-/-/-
request=X release=FALSE flags=a
instance lock=QK 59fb0f49 fa408245
set=0, complete=FALSE
set=1, complete=FALSE
set=2, complete=FALSE
data=
0000005e 45430009 5245544e 0057454e 00000000 00000000 00000000 00000000
00000000 44440010 34344634 43383746 46453045 00004530 00000000 00000000
00000000 45440016 4c554146 4f435f54 4d55534e 475f5245 50554f52 00000000
00000000 00000007 00000003 00000000 00000000 00000001 0a797801 20080e1b
00000000 00000000 00000000 00000000 00000000 00000000 3a53003e 44464633
33414534 35424531 32363839 37463936 37333243 39383533 30394141 42394442
42434436 42453135 37313637 45313330 43443830 43413641 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 ffffffff 7ffffffc 00000000
00000000 59fb0f49 a7571770 0000002d a7c04590 0000002f a7c04590 0000002f
00000001 54314582 a7571770 0000002d a7c3a920 0000002f a7c3a920 0000002f
00000002 00000000 a7571770 0000002d a7571a78 0000002d a7571a78 0000002d
00000005 00000000 cac1d800 00007fc6 00000000 00000000 00000000 00000000
a7571770 0000002d a7571ab0 0000002d a7571ab0 0000002d a7571ad8 0000002d
0000000a 59fb0f49 fa408245 00000000 00000101 00000000 0000011d 00000000
bbb1ab88 0000002f c9458628 0000002f bfa14a00 0000002d b2bbbe98 0000002d
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 b2976208 0000002f 00000001 54552caa 00050308 08910321
a7571b48 0000002d a7571b48 0000002d 076effe6 00000000 cac1d820 00007fc6
b2976260 0000002f b2976260 0000002f 00000000 00000000 06fc053c 00000000
cac1d820 00007fc6 bbb1ab88 0000002f b2bbbf38 0000002d bfa14aa0 0000002d
d9eec508 0000002f 00000002 00000001 a7571bb8 0000002d a7571bb8 0000002d
00000000 00000000 00000000 00000000 a7571bd8 0000002d a7571bd8 0000002d
a7571be8 0000002d a7571be8 0000002d c276bf80 0000002f c276bf80 0000002f
00000000 00001282 cac1d808 00007fc6 00000000 00000000 00000000 00000000
a7571c28 0000002d a7571c28 0000002d 00000000 00000000 00000000 00000000
f5d3fb88 00007fff 10600160 08000200 03050001 00000000 06fc053c 00000000
cac1d820 00007fc6 00000000 00000000 7a82ab73 00000002 00000fb2 00000000
a7571838 0000002d
通过0x2fe20af948去搜索session state object,可以找到对应的sql
SO: 0x2fe20af948, type: 4, owner: 0x2fc14626f0, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0x2fc14626f0, name=session, file=ksu.h LINE:12729, pg=0
(session) sid: 4937 ser: 1 trans: 0x2f863e8230, creator: 0x2fc14626f0
flags: (0x41) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
flags2: (0x48009) -/DDLT2/INC
DID: , short-term DID:
txn branch: (nil)
edition#: 100 oct: 17, prv: 0, sql: 0x2db37f7f88, psql: 0x2dcbb1eda8, user: 94/CENTERNEW
ksuxds FALSE at location: 0
service name: qfjsdb
client details:
O/S info: user: WIN-LH84TQS6F99$, term: unknown, ospid: 1234
machine: WIN-LH84TQS6F99 program: JDBC Thin Client
application name: JDBC Thin Client, hash value=2546894660
Current Wait Stack:
0: waiting for 'row cache lock'
cache id=0xa, mode=0x0, request=0x5
wait_id=1260 seq_num=1261 snap_id=1
wait times: snap=0.000403 sec, exc=0.000403 sec, total=0.000403 sec
wait times: max=3.000000 sec, heur=4 min 2 sec
wait counts: calls=1 os=1
in_wait=1 iflags=0x5a2
There is at least one session blocking this session.
Dumping first 3 direct blockers:
inst: 2, sid: 1262, ser: 3
inst: 2, sid: 3363, ser: 5
inst: 2, sid: 3049, ser: 1
Dumping final blocker:
inst: 2, sid: 1262, ser: 3
There are 1 sessions blocked by this session.
根据0x2db37f7f88去搜索LibraryHandle找到sql文本
LibraryHandle: Address=0x2db37f7f88 Hash=f92acd06 LockMode=N PinMode=0 LoadLockMode=0 Status=VALD
ObjectName: Name=grant execute any procedure to centernew
FullHashValue=689002101ac48dadbb9777a1f92acd06 Namespace=SQL AREA(00) Type=CURSOR(00) Identifier=4180331782 OwnerIdn=94
Statistics: InvalidationCount=2 ExecutionCount=3 LoadCount=4 ActiveLocks=1 TotalLockCount=3 TotalPinCount=1
Counters: BrokenCount=1 RevocablePointer=1 KeepDependency=1 Version=0 BucketInUse=2 HandleInUse=2 HandleReferenceCount=0
Concurrency: DependencyMutex=0x2db37f8038(0, 2, 0, 0) Mutex=0x2db37f80c8(0, 52, 0, 0)
Flags=RON/PIN/TIM/PN0/DBN/[10012841]
WaitersLists:
Lock=0x2db37f8018[0x2db37f8018,0x2db37f8018]
Pin=0x2db37f7ff8[0x2db37f7ff8,0x2db37f7ff8]
LoadLock=0x2db37f8070[0x2db37f8070,0x2db37f8070]
Timestamp: Current=10-27-2021 14:07:01
HandleReference: Address=0x2db37f8160 Handle=(nil) Flags=[00]
LibraryObject: Address=0x2db37ec100 HeapMask=0000-0001-0001-0000 Flags=EXS[0000] Flags2=[0000] PublicFlags=[0000]
ChildTable: size='16'
Child: id='0' Table=0x2db37ecfb0 Reference=0x2db37ec9f8 Handle=0x2db37fd7a0
NamespaceDump:
Parent Cursor: sql_id=br5vrn7wkpm86 parent=0x2db37ec1a0 maxchild=1 plk=y ppn=n
发现该session执行的sql是grant execute any procedure to centernew ,但是从session state object里面看到program: JDBC Thin Client,一般情况JDBC连接上来的session不会去单独执行grant操作。
搜索0x2fc14626f0可以找到process so,并且可以看到当时的short_stack
SO: 0x2fc14626f0, type: 2, owner: (nil), flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0x2fc14626f0, name=process, file=ksu.h LINE:12721, pg=0
(process) Oracle pid:119, ser:1, calls cur/top: 0x2dcae69d40/0x2dcae3d400
flags : (0x0) -
flags2: (0x0), flags3: (0x10)
intr error: 0, call error: 0, sess error: 0, txn error 0
intr queue: empty
ksudlp FALSE at location: 0
(post info) last post received: 0 0 80
last post received-location: kji.h LINE:3691 ID:kjata: wake up enqueue owner
last process to post me: 0x2fc9458628 2 6
last post sent: 0 0 81
last post sent-location: kji.h LINE:3694 ID:kjatb: wake up enqueue blocker
last process posted by me: 0x2fc9458628 2 6
(latch info) wait_event=0 bits=0x0
Process Group: DEFAULT, pseudo proc: 0x2fc173bcb0
O/S info: user: grid, term: UNKNOWN, ospid: 25668
OSD pid info: Unix process pid: 25668, image: oracle@qfjsdb1
Short stack dump:
ksedsts()+465<-ksdxfstk()+32<-ksdxcb()+1927<-sspuser()+112<-__sighandler()<-semtimedop()+10<-skgpwwait()+160<-ksliwat()+2022<-kslwaitctx()+163<-kqrigt()+1383<-kqrLockAndPinPo()+886<-kqrpre1()+982<-kqrpre()+19<-kzdugt()+255<-kzppsr()+342<-gradrv()+1419<-opiexe()+21971<-opiosq0()+3932<-opipls()+11961<-opiodr()+917<-rpidrus()+211<-skgmstack()+148<-rpiswu2()+690<-rpidrv()+1327<-psddr0()+473<-psdnal()+457<-pevm_EXIM()+305<-pfrinstr_EXIM()+53<-pfrrun_no_
calls cur/top: 0x2dcae69d40/0x2dcae3d400,继续搜索top 0x2dcae3d400看看是什么调用了该sql
SO: 0x2dcae3d400, type: 3, owner: 0x2fc14626f0, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0x2fc14626f0, name=call, file=ksu.h LINE:12725, pg=0
(call) sess: cur 2fe20af948, rec 2fe20af948, usr 2fe20af948; flg:38 fl2:1; depth:0
svpt(xcb:(nil) sptn:0x403 uba: 0x00000000.0000.00)
ksudlc FALSE at location: 0
SO: 0x2dcb5bd670, type: 79, owner: 0x2dcae3d400, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0x2fc14626f0, name=LIBRARY OBJECT PIN, file=kgl.h LINE:8755, pg=0
LibraryObjectPin: Address=0x2dcb5bd670 Handle=0x2dcb5e3770 Mode=S Lock=0x2dcb5b3b48 ClusterLock=0x2fcbf13520 Context=0x7f70cc168da8 User=0x2fe20af948 Session=0x2fe20af948 Count=3 Mask=0011 Flags=[08] SavepointNum=0x470
LibraryHandle: Address=0x2dcb5e3770 Hash=dd787fa5 LockMode=N PinMode=S LoadLockMode=0 Status=VALD
ObjectName: Name=CENTERNEW.DBT_ALL_FN_TRC_EXETCPU_REQ
FullHashValue=5638f5f26d1a229a59246106dd787fa5 Namespace=TABLE/PROCEDURE(01) Type=PROCEDURE(07) Identifier=313238 OwnerIdn=94
Statistics: InvalidationCount=0 ExecutionCount=0 LoadCount=1 ActiveLocks=1 TotalLockCount=2 TotalPinCount=2
Counters: BrokenCount=1 RevocablePointer=1 KeepDependency=0 Version=0 BucketInUse=10 HandleInUse=10 HandleReferenceCount=0
Concurrency: DependencyMutex=0x2dcb5e3820(0, 1, 0, 0) Mutex=0x2dcb5e38b0(0, 47, 0, 0)
发现是CENTERNEW.DBT_ALL_FN_TRC_EXETCPU_REQ存储过程调用的,由于systatedump是基于某个时刻,所以最好分析ash查询还有哪些存储过程对dc_users进行x模式请求。
SQL> select PLSQL_ENTRY_OBJECT_ID,PLSQL_ENTRY_SUBPROGRAM_ID,SQL_ID,COUNT(*) FROM DBA_HIST_ACTIVE_SESS_HISTORY WHERE SAMPLE_TIME>=TRUNC(SYSDATE) AND EVENT='row cache lock' and p1 in (10) and p3=5 group by PLSQL_ENTRY_OBJECT_ID,SQL_ID,PLSQL_ENTRY_SUBPROGRAM_ID;
PLSQL_ENTRY_OBJECT_ID PLSQL_ENTRY_SUBPROGRAM_ID SQL_ID COUNT(*)
--------------------- ------------------------- ------------- ----------
6y6skvd1h9rn9 84
313229 1 2kbmwvdzf4vx8 527
313229 1 br5vrn7wkpm86 444
313230 1 2kbmwvdzf4vx8 1865
313230 1 br5vrn7wkpm86 616
313231 1 2kbmwvdzf4vx8 444
313231 1 br5vrn7wkpm86 748
313235 1 2kbmwvdzf4vx8 129
313235 1 br5vrn7wkpm86 601
313238 1 2kbmwvdzf4vx8 1609
313238 1 br5vrn7wkpm86 302
SQL> select owner,object_name,object_type from dba_objects where object_id in (313229,313230,313231,313235,313238)
OWNER OBJECT_NAME OBJECT_TYP
------------------------------ ---------------------------------------- ----------
CENTERNEW DBT_ALL_FN_TRC_EXETCPU_REQ PROCEDURE
CENTERNEW DBT_ALL_FN_RM_LHBU_REQ PROCEDURE
CENTERNEW DBT_ALL_FN_HIGHWAYEXITLIST PROCEDURE
CENTERNEW DBT_ALL_FN_HIGHWAYENTRYLIST PROCEDURE
CENTERNEW DBT_ALL_FN_ETC_TOLL_BASEINFO PROCEDURE
SQL> select sql_text from dba_hist_sqltext where sql_id in ('2kbmwvdzf4vx8','br5vrn7wkpm86');
SQL_TEXT
--------------------------------------------------------------------------------
grant execute any pr
grant create any tab
至此此案例分析完毕,从ash可以看到有5个存储过程中包含了grant user的语句,并且反复执行了多次导致了此次故障。