故障描述:
数据库用户FUBONWR连接数据库时,发起connect操作后,长时间没反映,过几分钟后弹出 ORA-03135:连接失去联系的错误,其他用户都正常.
诊断过程:
1),查看alert_mis*.log ,查看sqlnet.log,listener.log 均无异常.
2),查询数据库等待信息,发现有大量library cache lock阻塞:
select * from v$session_wait where event not like 'SQL%' and event not like 'rdbms%';
40 430 26691 ASM background timer
68 716 21501 class slave wait
51 534 49970 class slave wait
49 521 55425 class slave wait
64 690 58524 class slave wait
15 158 32171 class slave wait
19 185 61087 class slave wait
1 15 16879 db file sequential r
12 131 43281 DIAG idle wait comp
7 79 12365 DIAG idle wait comp
55 560 27286 gc cr request file
25 235 49570 GCR sleep lock
16 170 10636 gcs remote message
22 209 17538 gcs remote message
20 196 46374 gcs remote message
18 183 22806 gcs remote message
14 157 3440 ges remote message
8 95 4371 jobq slave wait
6 54 1618 jobq slave wait
52 535 4 library cache lock
43 459 4 library cache lock
42 445 4 library cache lock
54 548 5 library cache lock
45 472 4 library cache lock
47 496 3 library cache lock
46 484 3 library cache lock
50 522 4 library cache lock
48 510 4 library cache lock
69 718 3 library cache lock
67 705 3 library cache lock
66 703 4 library cache lock
77 808 4 library cache lock
75 796 4 library cache lock
73 769 4 library cache lock
65 691 4 library cache lock
59 640 4 library cache lock
58 639 4 library cache lock
56 601 4 library cache lock
62 665 4 library cache lock
61 664 4 library cache lock
60 651 3 library cache lock
26 237 4 library cache lock
24 224 4 library cache lock
23 212 4 library cache lock
29 302 4 library cache lock
28 288 4 library cache lock
27 275 4 library cache lock
11 120 4 library cache lock
10 106 4 library cache lock
3 28 4 library cache lock
21 198 4 library cache lock
17 172 4 library cache lock
13 146 4 library cache lock
30 314 4 library cache lock
37 394 4 library cache lock
35 380 4 library cache lock
38 406 4 library cache lock
41 432 4 library cache lock
39 418 4 library cache lock
34 366 4 library cache lock
31 328 4 library cache lock
32 341 4 library cache lock
3),查询3个节点的dba_kgllock信息,发现阻塞根源在mis3实例上:
SQL> select * from dba_kgllock where kgllkhdl='000000010B60CE68';
KGLLKUSE KGLLKHDL KGLLKMOD KGLLKREQ KGLLKTYPE
---------------- ---------------- -------------------- ---------
00000001694DCC18 000000010B60CE68 0 2 Lock
000000016848B1B0 000000010B60CE68 0 2 Lock
000000016F54AB78 000000010B60CE68 0 2 Lock
000000016845B730 000000010B60CE68 0 2 Lock
000000016948F508 000000010B60CE68 0 2 Lock
000000016A63F820 000000010B60CE68 0 2 Lock
...
000000016F603628 000000010B60CE68 0 2 Lock
00000001685B7EF8 000000010B60CE68 0 2 Lock
000000016A500CE8 000000010B60CE68 0 2 Lock
00000001694F19B0 000000010B60CE68 0 2 Lock
00000001694F19B0 000000010B60CE68 0 2 Lock
000000016A4DA160 000000010B60CE68 3 0 Lock
164 rows selected
阻塞者为会话标识'000000016A4DA160'
4),查询会话标识'000000016A4DA160'的会话信息:
SQL> select * from gv$session where saddr='000000016A4DA160';
SADDR SID SERIAL# AUDSID PADDR USER# USERNAME
---------------- ---------- ---------- ---------- ---------------- ---------- ------------------------------
000000016A4DA160 275 4053 11749970 000000016F417E68 0
SQL> select * from v$process where addr='000000016F417E68';
ADDR PID SPID PNAME USERNAME SERIAL# TERMINAL
---------------- ---------- ------------------------ ----- --------------- ---------- ----------------------
000000016F417E68 85 24231 grid 176 UNKNOWN
发现这是grid操作系统用户发起的后台进程.其正在遭遇锁等待事件"enq: TX - row lock contention ",查询其正在做的事情
SQL> select * from v$sql where sql_id='b84cknyvnyq25';
SQL_TEXT SQL_FULLTEXT SQL_ID
-------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------------
update user$ set exptime=DECODE(to_char(:2, 'YYYY-MM-DD'), '0000-00-00', to_date update user$ set exptime=DECODE(to_char(:2, 'YYYY-MM-DD'), '0000-00-00', to_date b84cknyvnyq25
update user$ set exptime=DECODE(to_char(:2, 'YYYY-MM-DD'), '0000-00-00', to_date update user$ set exptime=DECODE(to_char(:2, 'YYYY-MM-DD'), '0000-00-00', to_date b84cknyvnyq25
update user$ set exptime=DECODE(to_char(:2, 'YYYY-MM-DD'), '0000-00-00', to_date update user$ set exptime=DECODE(to_char(:2, 'YYYY-MM-DD'), '0000-00-00', to_date b84cknyvnyq25
5),查询grid用户发起的这个进程的持锁信息:
SQL> select * from v$lock where sid=275;
ADDR KADDR SID TYPE ID1 ID2 LMODE REQUEST CTIME BLOCK
---------------- ---------------- ---------- ---- ---------- ---------- ---------- ---------- ---------- ----------
000000016F7D2498 000000016F7D24F0 275 TX 2228236 11819 0 6 11950 0
00002ACF88E39588 00002ACF88E395E8 275 TM 22 0 3 0 11950 2
SQL> select * from v$lock where id1=2228236 and id2=11819;
ADDR KADDR SID TYPE ID1 ID2 LMODE REQUEST CTIME BLOCK
---------------- ---------------- ---------- ---- ---------- ---------- ---------- ---------- ---------- ----------
000000016F7D2498 000000016F7D24F0 275 TX 2228236 11819 0 6 12001 0
000000016F7D4D20 000000016F7D4D78 223 TX 2228236 11819 0 6 13559 0
000000015BD8B120 000000015BD8B198 198 TX 2228236 11819 6 0 61180 1
发现grid发起的这个进程被sid为198的会话所阻塞.
6),sid为198的会话信息:
SQL> select * from v$session where sid=198;
SADDR SID SERIAL# AUDSID PADDR USER# USERNAME COMMAND OWNERID TADDR LOCKWAIT
---------------- ---------- ---------- ---------- ---------------- ---------- ------------------------------ ---------- ---------- ---------------- -----------
00000001694AD198 198 10253 11749433 00000001693A55E8 62 FUBONWR 0 2147483644 000000015BD8B120
SQL> select * from v$sql where sql_id='29bdpm894vydq';
SQL_TEXT SQL_FULLTEXT
-------------------------------------------------------------------------------- ------------------------------------------------------------------------------
declare Lines sys.dbms_output.chararr; begin :NumLines := 10; sys.dbms_out declare
Lines sys.dbms_output.chararr;
begin
:NumLines := 10;
sys.dbms
发现这是个开发人员用pl sql dev连上来的会话, 当前有个事务没有提交, 但没找到相关人员, 不知道其执行了什么样的未提交事务, 只查询到上述前一个操作的sql,无特别的意义.
7),没有做进一步的诊断, 清理掉sid为198的非关键会话.
SQL> alter system kill session '198,10253';
System altered
SQL>
8),清理掉上述会话后, library cache lock会话开始释放, 最终回复正常.