[oracle@kdexa1db01 (orarpt1)~]$ tnsping anbob2
TNS Ping Utility for Linux: Version 11.2.0.3.0 - Production on 05-JUN-2015 10:39:14
Copyright (c) 1997, 2011, Oracle. All rights reserved.
Used parameter files:
Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = 133.96.60.16)(PORT = 1521)) (CONNECT_DATA = (SID = anbobc2)))
OK (10 msec)
[oracle@kdexa1db01 (orarpt1)~]$ sqlplus weejar@anbob2
SQL*Plus: Release 11.2.0.3.0 Production on Fri Jun 5 10:40:36 2015
Copyright (c) 1982, 2011, Oracle. All rights reserved.
Enter password:
ERROR:
ORA-12519: TNS:no appropriate service handler found复制
TIP:
通常当出现ora-12519 or 12516时都是因为数据库进程数超过了数据库参数processes 或 sessions 时, 并且通常在db alert 中出现ora-20 or ora-18 的错误信息,如果当时查看监听服务状态使用lsnrctl service 会发现service 当时是”blocked”状态, listener 进程是有pmon 进程在不断的获得DB 的最新负载及连接数和db 的processes parameter, pmon 的更新前不是每秒,通常在5s-10分钟之间,负载变化大时会自己加快更新频率。 listener自己知道会话增加的数量,但是当会话断开后listener不会立即知道, 也就是当listener 上的service 在”blocked”状态后,知道下次pmon 进程通知现在当前连接小于processes parameter 值时(当然sessions 也是限制,包括SQL解析前系统的递归会话),新的连接请求才能创建成功。 今天遇到了一个案例,这里分享一下我处理思路。 希望对你有用。
先观察问题当时的数据库资源使用情况
SQL> select * from v$resource_limit;
RESOURCE_NAME CURRENT_UTILIZATION MAX_UTILIZATION INITIAL_AL LIMIT_VALU
------------------------------ ------------------- --------------- ---------- ----------
processes 1057 1247 4000 4000 <<<
sessions 1093 1249 4405 4405 <<<
enqueue_locks 21693 53190 53190 53190
enqueue_resources 780 1039 20424 UNLIMITED
ges_procs 1056 1245 4001 4001
ges_ress 0 0 82503 UNLIMITED
ges_locks 0 0 128193 UNLIMITED
ges_cache_ress 7417 95371 0 UNLIMITED
ges_reg_msgs 1223 2996 8750 UNLIMITED
ges_big_msgs 245 442 1934 UNLIMITED
ges_rsv_msgs 0 0 1000 1000
gcs_resources 1400382 2253096 2421479 2421479
gcs_shadows 726267 1520777 2421479 2421479
dml_locks 126 2469 19380 UNLIMITED
temporary_table_locks 8 352 UNLIMITED UNLIMITED
transactions 16 2472 4845 UNLIMITED
branches 37 136 4845 UNLIMITED
cmtcallbk 0 3 4845 UNLIMITED
sort_segment_locks 106 125 UNLIMITED UNLIMITED
max_rollback_segments 170 185 4845 65535
max_shared_servers 0 0 UNLIMITED UNLIMITED
parallel_max_servers 26 42 40 3600复制
TIP:
当时的数据库进程和会话数实际并未达到processes parameter 的50%,使用count v$session也可以确认。那我们可以在DB本机绕过LISTENER,尝试创建一个新的连接是否成功?
[oracle:/home/oracle]# sqlplus weejar/weejar_dba123
SQL*Plus: Release 10.2.0.5.0 - Production on Fri Jun 5 10:45:01 2015
Copyright (c) 1982, 2010, Oracle. All Rights Reserved.
Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options
SQL> exit
Disconnected from Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options
[oracle:/home/oracle]# sqlplus weejar/weejar_dba123@133.96.60.16:1521/anbobc.com
SQL*Plus: Release 10.2.0.5.0 - Production on Fri Jun 5 10:45:20 2015
Copyright (c) 1982, 2010, Oracle. All Rights Reserved.
ERROR:
ORA-12516: TNS:listener could not find available handler with matching protocol
stack复制
NOTE:
通过上面尝试可以确认,数据库实际并发达到最大连接数限制, 连接时ora-125xx错误应该是 listener上的限制,下面查看监听状态。
[oracle:/home/oracle]# ps -ef|grep lsnr
grid 8519706 1 0 May 15 - 17:08 /oracle/app/11.2.0.3/grid/bin/tnslsnr LISTENER -inherit
oracle 26411174 34931032 0 10:45:30 pts/2 0:00 grep lsnr
grid 12058930 1 0 May 15 - 0:32 /oracle/app/11.2.0.3/grid/bin/tnslsnr LISTENER_SCAN1 -inherit
[oracle:/home/oracle]# /oracle/app/11.2.0.3/grid/bin/lsnrctl status
LSNRCTL for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production on 05-JUN-2015 10:45:45
Copyright (c) 1991, 2011, Oracle. All rights reserved.
Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
STATUS of the LISTENER
------------------------
Alias LISTENER
Version TNSLSNR for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production
Start Date 15-MAY-2015 02:26:10
Uptime 21 days 8 hr. 19 min. 36 sec
Trace Level off
Security ON: Local OS Authentication
SNMP ON
Listener Parameter File /oracle/app/11.2.0.3/grid/network/admin/listener.ora
Listener Log File /oracle/app/grid/diag/tnslsnr/anbob2/listener/alert/log.xml
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=133.96.60.16)(PORT=1521)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=133.96.60.116)(PORT=1521)))
Services Summary...
Service "anbobc.com" has 1 instance(s).
Instance "anbobc2", status READY, has 1 handler(s) for this service...
Service "anbobc_XPT.com" has 1 instance(s).
Instance "anbobc2", status READY, has 1 handler(s) for this service...
The command completed successfully
[oracle:/home/oracle]# /oracle/app/11.2.0.3/grid/bin/lsnrctl service
LSNRCTL for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production on 05-JUN-2015 10:45:55
Copyright (c) 1991, 2011, Oracle. All rights reserved.
Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
Services Summary...
Service "anbobc.com" has 1 instance(s).
Instance "anbobc2", status READY, has 1 handler(s) for this service...
Handler(s):
"DEDICATED" established:190741 refused:0 state:blocked <<<<<<<<
LOCAL SERVER
Service "anbobc_XPT.com" has 1 instance(s).
Instance "anbobc2", status READY, has 1 handler(s) for this service...
Handler(s):
"DEDICATED" established:190741 refused:0 state:blocked
LOCAL SERVER
The command completed successfully复制
TIP:
这个库是10.2.0.5 的DB, 使用的是11.2.0.3的GI, 监听在grid 的用户下。 当时的服务在监听上,只是service 的状态已经是”blocked”, 当然有可能是主机资源问题导致无法创建新连接,在早前的版本中还有可能是listener.log文件的过大,这两个原因都已排除。
+-topas_nmon--l=LongTerm-CPU-----Host=anbob2---------Refresh=2 secs---10:51.45---------------
| Memory ------------------------------------------------------------------------------------
| Physical PageSpace | pages/sec In Out | FileSystemCache
|% Used 58.3% 0.3% | to Paging Space 0.0 0.0 | (numperm) 9.7%
|% Free 41.7% 99.7% | to File System 0.0 1.5 | Process 42.9%
|GB Used 70.0GB 0.2GB | Page Scans 0.0 | System 5.7%
|GB Free 50.0GB 63.8GB | Page Cycles 0.0 | Free 41.7%
|Total(GB) 120.0GB 64.0GB | Page Steals 0.0 | ------
| | Page Faults 8280.1 | Total 100.0%
|------------------------------------------------------------ | numclient 9.7%
|Min/Maxperm 3574MB( 3%) 23827MB( 20%) <--% of RAM | maxclient 10.0%
|Min/Maxfree 960 1088 Total Virtual 184.0GB | User 49.6%
|Min/Maxpgahead 2 8 Accessed Virtual 57.9GB 31.4%| Pinned 8.6%
| 20.0 | lruable pages 304991复制
下面查看CRS资源状态。
anbob2:/home/grid> crsctl stat res -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.LISTENER.lsnr
ONLINE ONLINE anbob1
ONLINE ONLINE anbob2
ora.asm
OFFLINE OFFLINE anbob1
OFFLINE OFFLINE anbob2
ora.gsd
OFFLINE OFFLINE anbob1
OFFLINE OFFLINE anbob2
ora.net1.network
ONLINE ONLINE anbob1
ONLINE ONLINE anbob2
ora.ons
ONLINE ONLINE anbob1
ONLINE ONLINE anbob2
ora.registry.acfs
ONLINE OFFLINE anbob1
ONLINE OFFLINE anbob2
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE anbob2
ora.cvu
1 ONLINE ONLINE anbob2
ora.anbob1.vip
1 ONLINE ONLINE anbob1
ora.anbob2.vip
1 ONLINE ONLINE anbob2
ora.oc4j
1 ONLINE ONLINE anbob2
ora.scan1.vip
1 ONLINE ONLINE anbob2复制
TIP:
现在时间紧迫,前先需要恢复连接, 此时的节点1还好是正常,新的连接被转移到了节点2, 现在可以猜测有可能是listener.ora 上的连接数信息不正确, 尝试重启listener
su - grid
srvctl stop listener -n anbob2
srvctl start listener -n anbob2
su - oracle
alter system register;复制
NOTE:
再次查看listener的状态,发现服务没有注册上,数据库登录尝试手动注册也未成功,包括使用tnsnames.ora + local_listener的方式尝试也未成功(default port:1521 not need),动态注册不成功,我们放弃这种方式,使用静态注册,注册11g的监听和以前版本的区别,这里我使用了netmgr图形化操作,操作细节未记录。静态注册成功后,再次重启监听,下面再尝试连接。
[oracle@anbob2:/home/oracle]#/oracle/app/11.2.0.3/grid/bin/lsnrctl status
LSNRCTL for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production on 05-JUN-2015 17:21:10
Copyright (c) 1991, 2011, Oracle. All rights reserved.
Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
STATUS of the LISTENER
------------------------
Alias LISTENER
Version TNSLSNR for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production
Start Date 05-JUN-2015 12:07:11
Uptime 0 days 5 hr. 13 min. 59 sec
Trace Level off
Security ON: Local OS Authentication
SNMP ON
Listener Parameter File /oracle/app/11.2.0.3/grid/network/admin/listener.ora
Listener Log File /oracle/app/grid/diag/tnslsnr/anbob2/listener/alert/log.xml
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=133.96.60.16)(PORT=1521)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=133.96.60.116)(PORT=1521)))
Services Summary...
Service "anbobc.com" has 2 instance(s).
Instance "anbobc2", status UNKNOWN, has 1 handler(s) for this service...
# sqlplus weejar/weejar_dba123@133.96.60.16:1521/anbobc.com
SQL*Plus: Release 10.2.0.5.0 - Production on Fri Jun 5 13:28:42 2015
Copyright (c) 1982, 2010, Oracle. All Rights Reserved.
Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options
SQL>复制
NOTE:
注意到这里新的连接已经成功。松了口气, 下面开始分析当时的原因,为什么监听连接数会不正确?为什么动态注册service不能注册到Listener上?这些从我任务从一开始我们就有说过是有pmon完成的。下面排查是不是pmon Background process 出了问题。
# DB alert no ora-18 and ora-20
Fri Jun 05 10:45:46 BEIST 2015
PMON failed to acquire latch, see PMON dump
Fri Jun 05 10:54:38 BEIST 2015
# pmon trace
*** 2015-06-05 10:45:46.237
PMON unable to acquire latch 700000ad1a9ee98 Child ges resource hash list level=1 child#=1254
Location from where latch is held: kjrmas1: lookup master node:
Context saved from call: 504403196568791072
state=busy, wlstate=free
waiters [orapid (seconds since: put on list, posted, alive check)]:
1064 (0, 1433476592, 0)
1051 (0, 1433476592, 0)
1058 (0, 1433476592, 0)
1048 (0, 1433476592, 0)
1050 (0, 1433476592, 0)
1055 (0, 1433476592, 0)
1031 (0, 1433476592, 0)
waiter count=7
gotten 20376414 times wait, failed first 10108322 sleeps 15878606 <<<<<<<<<<<<<<<<<<
gotten 11598 times nowait, failed: 59605
recovery area:
Dump of memory from 0x0700000AD021B280 to 0x0700000AD021B280
possible holder pid = 1066 ospid=46072128 <<<<<<<<<<<<<<<<<<<<<
----------------------------------------
SO: 700000ad571b5e8, type: 2, owner: 0, flag: INIT/-/-/0x00
(process) Oracle pid=1066, calls cur/top: 700000a7756c988/700000a333d74a8, flag: (0) -
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 0 0 55
last post received-location: kjata: wake up enqueue owner
last process to post me: 700000ad86f6730 1 6
last post sent: 0 0 108
last post sent-location: kjmpost: post lmd
last process posted by me: 700000ad86f6730 1 6
(latch info) wait_event=0 bits=2
holding (efd=16) 700000ad1a9ee98 Child ges resource hash list level=1 child#=1254
Location from where latch is held: kjrmas1: lookup master node:
Context saved from call: 504403196568791072
state=busy, wlstate=free
waiters [orapid (seconds since: put on list, posted, alive check)]:
1064 (0, 1433476592, 0)
1051 (0, 1433476592, 0)
1058 (0, 1433476592, 0)
1048 (0, 1433476592, 0)
1050 (0, 1433476592, 0)
1055 (0, 1433476592, 0)
1031 (0, 1433476592, 0)
waiter count=7
Process Group: DEFAULT, pseudo proc: 700000ad47e8d68
O/S info: user: oracle, term: UNKNOWN, ospid: 46072128
OSD pid info: Unix process pid: 46072128, image: oracle@anbob2 (P015)
Short stack dump:
ksdxfstk+002c<-ksdxcb+0500<-sspuser+0074<-000048BC<-kjskchcv+00f0<-kjuocl+0750<-kjusuc+05d0<-ksipgetctx+04fc<-ksqcmi+1b0c<-ksqgtlctx+1214<-ksqgelctx+0358复制
TIP:
可以看到问题时段,果然pmon进程有些busy, 取不到latch ges resource hash list, blocker是spid=46072128,pid = 1066进程
[oracle@anbob2:/home/oracle]#ps -ef|grep 46072128
oracle 19071202 45220332 0 15:07:58 pts/1 0:00 grep 46072128
oracle 46072128 1 10 Jun 03 - 311:14 ora_p015_anbobc2
SQL> @uopid 1066
USERNAME SID AUDSID OSUSER MACHINE PROGRAM
PID SPID SQL_HASH_VALUE STATUS
----------------------- -------------- ----------- ---------------- ------------------ --------------------
---------- ------------ -------------- --------
REPORT '3609,36916' 262269818 report kybb1 (P015)
1066 46072128 1746131492 ACTIVE
SQL> @usid 3609
USERNAME SID AUDSID OSUSER MACHINE PROGRAM SPID
OPID CPID SQL_ID HASH_VALUE LASTCALL STATUS SADDR PADDR TADDR LOGON_TIME
----------------------- -------------- ----------- ---------------- ------------------ -------------------- --------------
------ ------------ ------------- ----------- ---------- -------- ---------------- ---------------- ---------------- -----------------
REPORT '3609,36916' 262269818 report kybb1 (P015) 46072128
1066 46072128 8av2z2xn17qj4 1746131492 218808 ACTIVE 0700000AD4AA4250 0700000AD571B5E8 0700000A99E44CB8 20150603 02:24:40
SQL> @xi 8av2z2xn17qj4 %
eXplain the execution plan for sqlid 8av2z2xn17qj4 child %...
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------
SQL_ID 8av2z2xn17qj4, child number 0
-------------------------------------
create table xxxx_315 nologging as select /*+use_hash(a b) parallel(a 8)
no_index(a)*/ 20150602 cycle,315 region,subsid usernum,acctid,nvl(productgroup,'PrdGrpGlobal')
productgroup,nvl(productid,'gl.g.nml') productid xxxxx --省略
Plan hash value: 2866634249
------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | E-Rows | Pstart| Pstop | OMem | 1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------
| 0 | CREATE TABLE STATEMENT | | | | | | | |
| 1 | PX COORDINATOR | | | | | | | |
| 2 | PX SEND QC (RANDOM) | :TQ10002 | 154M| | | | | |
| 3 | LOAD AS SELECT | | | | | 256K| 256K| |
|* 4 | HASH JOIN OUTER | | 154M| | | 2047M| 113M| |
| 5 | PX RECEIVE | | 154M| | | | | |
| 6 | PX SEND HASH | :TQ10000 | 154M| | | | | |
| 7 | PX BLOCK ITERATOR | | 154M| 6 | 6 | | | |
|* 8 | TABLE ACCESS FULL| xxxx_315 | 154M| 6 | 6 | | | |
| 9 | PX RECEIVE | | 1 | | | | | |
| 10 | PX SEND HASH | :TQ10001 | 1 | | | | | |
| 11 | PX BLOCK ITERATOR | | 1 | 34 | 34 | | | |
|* 12 | TABLE ACCESS FULL| SUBSCRIBER | 1 | 34 | 34 | | | |
------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("XXXX"."REGION"="SUBSCRIBER"."REGION" AND
"XXXX"."SUBSID"="SUBSCRIBER"."OID")
8 - access(:Z>=:Z AND :Z<=:Z) filter(("REGION"=315 AND "WRTOFFSN"=0)) 12 - access(:Z>=:Z AND :Z<=:Z) filter(("CYCLE"=201505 AND "REGION"=315)) SQL> @px
Show current Parallel Execution sessions in RAC cluster...
QC_SID QCINST_ID USERNAME SQL_ID DEGREE REQ_DEGREE SLAVES INST_CNT MIN_INST MAX_INST
------------- ---------- ---------- ------------------ ---------- ---------- ---------- ---------- ---------- ----------
3864,47259 2 REPORT 8av2z2xn17qj4 8 8 16 1 2 2
3864,47259 2 REPORT 19gg34w9r4r5m 8 8 8 1 2 2
3560,39391 2 SYS 6643dd2jtv7jh 2 2 2 2 1 2
SQL> @usid 3864
USERNAME SID AUDSID OSUSER MACHINE PROGRAM SPID OPID CPID
SQL_ID HASH_VALUE LASTCALL STATUS SADDR PADDR TADDR LOGON_TIME
----------------------- -------------- ----------- ---------------- ------------------ -------------------- -------------- ------ ------------
------------- ----------- ---------- -------- ---------------- ---------------- ---------------- -----------------
REPORT '3864,47259' 262269818 report kybb1 (TNS V1-V3) 27721980 544 27156
8av2z2xn17qj4 1746131492 220783 ACTIVE 0700000AD8AFDB78 0700000AD271DA38 0700000A9E8B74D8 20150603 02:22:13复制
查看当前的活动会话(FOREGROUND)
USERNAME USID MACHINE PROGRAM EVENT LAST_CALL_ET SQL_ID WT_SECINW SQLTEXT
---------- ---------- ---------- -------------- -------------------- ------------ ------------------ ---------- ------------------------------
ACCOUNT 4140 kdbdsc4 intf_accsrv@kd db file sequential r 0 4t9v2u4zdspxd 0:0 select a.scoretypeid,to_char(s
ACCOUNT 3221 kdbdsc3 intf_accsrv@kd db file sequential r 0 4t9v2u4zdspxd 0:0 select a.scoretypeid,to_char(s
USERINFO 3889 kycbe13 JDBC Thin Clie SQL*Net message from 0 33hk2fjysmd3w -1:0 update ENTITY_SYNC_LOG set sen
REPORT 4158 kybb1 task@kybb1 (TN latch: ges resource 3723 2mk5dt5xjg7zt 0:0 INSERT INTO TB_BALANCE_FLOW_LO
REPORT 3819 kybb1 task@kybb1 (TN latch: ges resource 3723 6n30d53a4ts94 0:0 insert into tb_collect_log_red
REPORT 4110 kybb1 task@kybb1 (TN latch: ges resource 3723 6n30d53a4ts94 0:0 insert into tb_collect_log_red
REPORT 4169 kybb1 task@kybb1 (TN latch: ges resource 3723 2mk5dt5xjg7zt 0:0 INSERT INTO TB_BALANCE_FLOW_LO
REPORT 4042 kybb1 task@kybb1 (TN latch: ges resource 3848 acn50cgh015ba 0:0 insert into tb_user_charge_d
REPORT 4191 kybb1 task@kybb1 (TN latch: ges resource 3848 d9hjdb4nkg18t 0:0 insert into tb_user_charge_d
REPORT 3304 kybb1 task@kybb1 (TN latch: ges resource 139542 b5anr2qjt2h9k 2:0 insert /*+append*/into stat_ba
REPORT 4002 kybb1 task@kybb1 (TN latch: ges resource 172083 cfyrg0hs1qh3a 0:0 UPDATE SUBSCRIBER S SET S.VIPI
REPORT 3584 kybb1 oracle@anbob2 latch: ges resource 219101 8av2z2xn17qj4 0:0 create table xxxx_315
REPORT 3718 kybb1 oracle@anbob2 latch: ges resource 219101 8av2z2xn17qj4 0:0 create table xxxx_315
REPORT 3285 kybb1 oracle@anbob2 latch: ges resource 219101 8av2z2xn17qj4 0:0 create table xxxx_315
REPORT 3394 kybb1 oracle@anbob2 latch: ges resource 219101 8av2z2xn17qj4 0:0 create table xxxx_315
REPORT 3830 kybb1 oracle@anbob2 latch: ges resource 219101 8av2z2xn17qj4 0:0 create table xxxx_315
REPORT 3652 kybb1 oracle@anbob2 latch: ges resource 219101 8av2z2xn17qj4 0:0 create table xxxx_315
REPORT 3614 kybb1 oracle@anbob2 latch: ges resource 219101 8av2z2xn17qj4 27:0 create table xxxx_315
REPORT 3609 kybb1 oracle@anbob2 latch: ges resource 219101 8av2z2xn17qj4 0:0 create table xxxx_315
REPORT 3864 kybb1 task@kybb1 (TN PX Deq: Execute Repl 219248 8av2z2xn17qj4 0:0 create table xxxx_315
REPORT 3596 kybb1 oracle@anbob2 PX Deq: Execution Ms 219247 19gg34w9r4r5m 0:1 SELECT * FROM REGION_LIST WHER
REPORT 3850 kybb1 oracle@anbob2 PX Deq: Execution Ms 219247 19gg34w9r4r5m 0:2 SELECT * FROM REGION_LIST WHER
SYS 3560 anbob2 sqlplus@anbob2 SQL*Net message to c 3 d2b6tdq5jmju3 -1:3 select /*+rule*/ ses.use复制
Tip:
可以看到当前连接会话中latch ges resource的也就两个program,其中有个create table 的进程使用了8个parallel processes。而且这个会话已经持续执行了快22万秒,下面使用Poder的脚本查看是否有latch 级联阻塞。
SQL> @ash_wait_chains session_id||':'||program2||event2 session_type='FOREGROUND' sysdate-10/24/60 sysdate
-- Display ASH Wait Chain Signatures script v0.2 BETA by Tanel Poder ( http://blog.tanelpoder.com )
%This SECONDS AAS WAIT_CHAIN
------ ---------- ---------- ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1% 599 1 -> 3761:(task) latch: ges resource hash list
1% 599 1 -> 3797:(task) ON CPU
1% 597 1 -> 3474:(intunit_get_sq) library cache lock
1% 597 1 -> 3869:(task) library cache lock
1% 597 1 -> 3303:(task) library cache lock
1% 574 1 -> 3261:(sqlplus) SQL*Net message from dblink
1% 527 .9 -> 4169:(task) latch: ges resource hash list
1% 524 .9 -> 3464:(sqlplus) SQL*Net message from dblink
1% 523 .9 -> 3718:(Pnnn) latch: ges resource hash list
1% 521 .9 -> 3652:(Pnnn) latch: ges resource hash list
1% 521 .9 -> 4002:(task) latch: ges resource hash list
1% 521 .9 -> 3584:(Pnnn) latch: ges resource hash list
1% 521 .9 -> 3304:(task) latch: ges resource hash list
1% 518 .9 -> 3830:(Pnnn) latch: ges resource hash list
1% 517 .9 -> 4299:(sqlplus) db file sequential read
1% 511 .9 -> 3394:(Pnnn) latch: ges resource hash list
1% 510 .9 -> 3609:(Pnnn) latch: ges resource hash list
1% 509 .8 -> 4191:(task) latch: ges resource hash list
1% 509 .8 -> 3780:(sqlplus) db file sequential read
1% 507 .8 -> 4110:(task) latch: ges resource hash list
1% 506 .8 -> 3614:(Pnnn) latch: ges resource hash list
1% 504 .8 -> 4158:(task) latch: ges resource hash list
1% 503 .8 -> 3301:(sqlplus) db file sequential read
1% 502 .8 -> 3895:(sqlplus) ON CPU
1% 499 .8 -> 3285:(Pnnn) latch: ges resource hash list
1% 497 .8 -> 3819:(task) latch: ges resource hash list
1% 496 .8 -> 4042:(task) latch: ges resource hash list
1% 471 .8 -> 4193:(sqlplus) SQL*Net message from dblink
1% 441 .7 -> 2975:(oracle) db file sequential read
1% 344 .6 -> 2928:(sqlplus) db file scattered read
30 rows selected.
SQL> @uspid 3867094
USERNAME SID AUDSID OSUSER MACHINE PROGRAM SPID SQL_HASH_VALUE LASTCALL STATUS
----------------------- -------------- ----------- ---------------- ------------------ -------------------- ------------ -------------- ---------- --------
'4405,1' 0 oracle anbob2 oracle@anbob2 (PMON) 3867094 0 1865174 ACTIVE
SQL> oradebug setospid 3867094
Oracle pid: 2, Unix process pid: 3867094, image: oracle@anbob2 (PMON)
SQL> oradebug short_stack
ksdxfstk+002c<-ksdxcb+0500<-sspuser+0074<-000048BC<-sskgpwwait+0034<-skgpwwait+00bc<-ksliwat+06c0<-kslwaitns_timed+0024<
-kskthbwt+0280<-kslewat+01bc<-kslges+04b8<-kslgetl+0384<-kjuscl+0390<-ksiprls+0178<-ksqcmi+2a9c<-ksqrcl+051c<-ksqdeli+014c
<-ksqdel+0014<-ksqsod+00d4<-kssxdl+0350<-kssdch_stage+05d4<-kssdch+0014<-ksudlc+01fc<-kssxdl+0350<-kssdch_stage+05d4<-kssdch+0014
<-ksudlc+01fc<-kssxdl+0350<-kssdch_stage+05d4<-kssdch+0014<-ksudlc+01fc<-kssxdl+0350<-ksudlp+018c<-kssxdl+0350<-ksuxdl+02a0
<-ksuxda+028c<-ksucln+085c<-ksbrdp+04ec<-opirip+041c<-opidrv+0478<-sou2o+0090<-opimai_real+0150<-main+0098<-__start+0070 SQL> @s 4405
SID SQLID_AND_CHILD STATUS STATE EVENT SEQ# SEC_IN_WAIT BLOCKING_SID P1
P2 P3 P1TRANSL
------- -------------------- -------- ------- ---------------------------------------- ---------- ----------- ------------ ------------------
------------------ ------------------ ------------------------------------------
4405 0 ACTIVE WAITING latch: ges resource hash list 33650 0 UNKNOWN address=
number= 66 tries= 39 0x700000AD1A9EE98: ges resource hash list[
# awr
Snap Id Snap Time Sessions Curs/Sess
--------- ------------------- -------- ---------
Begin Snap: 50106 05-Jun-15 15:30:42 837 16.7
End Snap: 50107 05-Jun-15 16:00:31 838 16.8
Elapsed: 29.82 (mins)
DB Time: 691.71 (mins)
Top 5 Timed Events Avg %Total
~~~~~~~~~~~~~~~~~~ wait Call
Event Waits Time (s) (ms) Time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
latch: ges resource hash list 166,505 25,783 155 62.1 Other
db file sequential read 1,831,050 10,326 6 24.9 User I/O
CPU time 4,056 9.8
gc current block 2-way 1,173,399 904 1 2.2 Cluster
gc cr grant 2-way 847,244 468 1 1.1 Cluster
-------------------------------------------------------------复制
NOTE:
从上面的信息可以发现当前的pmon 进程在等待中,从call stack 可以确认,wait event 是’latch: ges resource hash list’, 也是当前实例的top event, 猜测与当前的CTAS 的会话有关,查询了MOS与该事件相关存在一个bug
Bug 11690639 – High enqueue activity results in “latch: ges resource hash list” waits (文档 ID 11690639.8)
处理方法21万秒那个会话存在异常,kill了那个会话,该事件消失。
SQL> @a
A-Script: Display active sessions...
COUNT(*) SQL_ID STATE EVENT
---------- ------------- ------- ----------------------------------------------------------------
5 4t9v2u4zdspxd WAITING db file sequential read
2 9z11d7hxfb1u6 WAITING db file sequential read
2 2mk5dt5xjg7zt WAITING db file sequential read
1 4t9v2u4zdspxd WAITING gc cr request
1 7gtuunjsd8cws WAITING db file sequential read
SQL> @s 4405
SID SQLID_AND_CHILD STATUS STATE EVENT SEQ# SEC_IN_WAIT BLOCKING_SID P1 P2 P3 P1TRANSL
------- -------------------- -------- ------- ---------------------------------------- ---------- ----------- ------------ ------------------ ------------------ ------------------ ------------------------------------------
4405 0 ACTIVE WORKING On CPU / runqueue 38981 19 UNKNOWN复制
NOTE:
可以看到pmon 进程当前的event 是’on cpu’ ,恢复了正常的状态, 下面检查监听,动态注册也恢复了正常。
[oracle@anbob2:/home/oracle]#/oracle/app/11.2.0.3/grid/bin/lsnrctl status
LSNRCTL for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production on 05-JUN-2015 17:21:10
Copyright (c) 1991, 2011, Oracle. All rights reserved.
Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
STATUS of the LISTENER
------------------------
Alias LISTENER
Version TNSLSNR for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production
Start Date 05-JUN-2015 12:07:11
Uptime 0 days 5 hr. 13 min. 59 sec
Trace Level off
Security ON: Local OS Authentication
SNMP ON
Listener Parameter File /oracle/app/11.2.0.3/grid/network/admin/listener.ora
Listener Log File /oracle/app/grid/diag/tnslsnr/anbob2/listener/alert/log.xml
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=133.96.60.16)(PORT=1521)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=133.96.60.116)(PORT=1521)))
Services Summary...
Service "anbobc.com" has 2 instance(s).
Instance "anbobc2", status UNKNOWN, has 1 handler(s) for this service...
Instance "anbobc2", status READY, has 1 handler(s) for this service...
Service "anbobc_XPT.com" has 1 instance(s).
Instance "anbobc2", status READY, has 1 handler(s) for this service...
The command completed successfully
[oracle@anbob2:/home/oracle]#/oracle/app/11.2.0.3/grid/bin/lsnrctl service
LSNRCTL for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production on 05-JUN-2015 17:21:29
Copyright (c) 1991, 2011, Oracle. All rights reserved.
Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
Services Summary...
Service "anbobc.com" has 2 instance(s).
Instance "anbobc2", status UNKNOWN, has 1 handler(s) for this service...
Handler(s):
"DEDICATED" established:1969 refused:0
LOCAL SERVER
Instance "anbobc2", status READY, has 1 handler(s) for this service...
Handler(s):
"DEDICATED" established:128 refused:0 state:ready
LOCAL SERVER
Service "anbobc_XPT.com" has 1 instance(s).
Instance "anbobc2", status READY, has 1 handler(s) for this service...
Handler(s):
"DEDICATED" established:128 refused:0 state:ready
LOCAL SERVER
The command completed successfully复制
总结:
当前该bug还无法确认,不过可以确认的是CTAS 长期持有latch: ges resource hash list ,堵塞了pmon 对latch 的获取,影响了对listener的注册和service_update,listener 中的会话信息只增加不更新已释放,慢慢导致了最后的service 状态blocked, 致使会话通过监听时报ora-12519 and 12516.
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。
评论
相关阅读
【纯干货】Oracle 19C RU 19.27 发布,如何快速升级和安装?
Lucifer三思而后行
785次阅读
2025-04-18 14:18:38
Oracle RAC 一键安装翻车?手把手教你如何排错!
Lucifer三思而后行
662次阅读
2025-04-15 17:24:06
Oracle数据库一键巡检并生成HTML结果,免费脚本速来下载!
陈举超
591次阅读
2025-04-20 10:07:02
【ORACLE】你以为的真的是你以为的么?--ORA-38104: Columns referenced in the ON Clause cannot be updated
DarkAthena
544次阅读
2025-04-22 00:13:51
【活动】分享你的压箱底干货文档,三篇解锁进阶奖励!
墨天轮编辑部
530次阅读
2025-04-17 17:02:24
【ORACLE】记录一些ORACLE的merge into语句的BUG
DarkAthena
509次阅读
2025-04-22 00:20:37
一页概览:Oracle GoldenGate
甲骨文云技术
495次阅读
2025-04-30 12:17:56
火焰图--分析复杂SQL执行计划的利器
听见风的声音
468次阅读
2025-04-17 09:30:30
3月“墨力原创作者计划”获奖名单公布
墨天轮编辑部
382次阅读
2025-04-15 14:48:05
OR+DBLINK的关联SQL优化思路
布衣
381次阅读
2025-05-05 19:28:36
热门文章
移除DataGuard Standby配置导致Primary启动失败
2023-08-17 21366浏览
使用dblink产生的”SELECT /*+ FULL(P) +*/ * FROM XXXXX P ” 解析
2023-06-20 20909浏览
Troubleshooting 'ORA-28041: Authentication protocol internal error' change password 12c R2 DB
2020-04-08 13709浏览
浅谈ORACLE免费数据库Oracle Database XE (Express Edition) 版
2018-10-31 7702浏览
High wait event ‘row cache mutex’ in 12cR2、19c
2020-08-14 5616浏览