在使用Oracle数据库的过程中,可能会遇到各种各样的错误或异常,很多异常的提示并不具体,我们有必要了解一下Oracle的ErrorStack跟踪方式。
ErrorStack是Oracle提供的一种对于错误堆栈进行跟踪的方法,通过设置跟踪可以将一些错误的后台信息详尽地转储出来,写入跟踪文件,对于错误的研究与诊断非常有效。
设置ErrorStack主要有4个级别:
0 - 仅转储错误堆栈(0级已经被逐渐废弃) 1 - 转储错误堆栈和函数调用堆栈 2 - Level1 + ProcessState 3 - Level2 + Context area(显示所有cursors,着重显示当前cursor)
复制
ErrorStack可以在实例级或会话级别设置,也可以在参数文件中设置,这个设置仅当某个特定的错误出现时才被触发,如设置ORA-00942事件的跟踪:
alter session set events '942 trace name errorstack level 1';
复制
一个客户曾经出现如下ORA-01438错误,提示数据的精度超过允许值,是后台JOB调度的任务:
Mon Jul 13 10:27:312009 Errors in file /admin/erpdb/bdump/erpdb1_j000_447020.trc: ORA-12012: error on auto execute of job 22 ORA-01438: value larger than specified precision allowed for this column ORA-06512: at "ERP.TIMRDU", line 13 ORA-06512: at line 1
复制
跟踪文件中默认的不会记录具体的SQL、绑定变量等信息,我们可以通过ErrorStack进行后台跟踪,获得更详细的信息,执行如下代码中的SQL:
alter system set events='1438 trace name errorstack forever,level 3';
复制
然后可以手工执行出错的存储过程(代码如下),获得跟踪文件,再关闭跟踪:
alter system set events='1438 trace name errorstack off';
复制
在Oracle 10g中,这样的操作会被记录到日志文件中:
Mon Jul 13 10:48:392009 OS Pid: 541528 executed alter system set events '1438 trace name Errorstack forever,level 3' Mon Jul 13 10:56:06 2009 Errors in file /admin/erpdb/udump/erpdb1_ora_267056.trc: ORA-01438: value larger than specified precision allowed for this column Mon Jul 13 10:56:08 2009 Trace dumping is performing id=[cdmp_20090713105608] Mon Jul 13 10:57:15 2009 OS Pid: 541528 executed alter system set events '1438 trace name Errorstack off'
复制
接下来分析获得的跟踪文件,就可以获得SQL文本线索,找到根本问题。在这个案例中,我们得到的跟踪文件,其关键SQL内容如下,通过这个SQL对照数据表很快就找到了精度超过的Number型字段:
***SESSION ID:(857.16304) 2009-07-13 10:56:06.429 *** 2009-07-13 10:56:06.429 ksedmp: internal or fatal error ORA-01438: value larger than specified precision allowed for this column Current SQL statement for this session: INSERT INTO CONTRAPAYM (IHCODE,GTICODE,IDX,HCODE,PORDATE,FCODE,FCY,CCODE,ECODE,FLAG,MATCHFLAG,BCODE,MONTHZL,STATUS,AUTOFLAG ,REMARK) SELECTDISTINCT IHCODE,'0000000000000000' GTICODE,(IDX+100) IDX,HCODE,:B3 PORDATE,FCODE,:B2FCY,CCODE,ECODE, FLAG,0 MATCHFLAG,BCODE,MONTHZL,STATUS,4 AUTOFLAG ,'Proc_AdjustContractpayment' REMARK FROM CONTRAPAYM WHERE IHCODE=:B1 ANDIDX=(SELECT MAX(IDX) FROM CONTRACTPAYMENT WHERE IHCODE=:B1 ) AND ROWNUM=1 ----- PL/SQL Call Stack ----- object line object handle number name 7000002ca366e80 100 procedure ERP.PROC_ADJCONTRAPAYM 700000336a1a070 236 procedure ERP.PROC_AUTOBATPROC 7000002ca367df0 5 procedure ERP.TIMRDU 700000342eb7c20 1 anonymous block
复制
在跟踪文件中,还有大量的堆栈信息,对于复杂的问题,还可以通过进一步细致的堆栈分析进行深入追踪。
可以很容易地测试这一功能的使用,比如使用如下代码中的测试过程:
SQL> alter system set events='1438 trace name errorstack forever,level 3'; SQL> connect eygle/eygle SQL> create table t (name varchar2(10),id number); Table created. SQL> insert into t values(a,1); insert into t values(a,1) * ERROR at line 1: ORA-00984: column not allowed here SQL> alter system set events '984 trace name errorstack off';
复制
在告警日志文件中就可以获得如下信息:
Mon Jul 13 22:55:592009 OS Pid: 2431 executed alter system set events '984 trace name ERRORSTACK level3' Mon Jul 13 22:59:12 2009 Errors in file /opt/oracle/admin/mmstest/udump/mmstest_ora_2520.trc: ORA-00984: column not allowed here Mon Jul 13 23:01:01 2009 OS Pid: 2431 executed alter system set events '984 trace name errorstack off'
复制
获得的跟踪文件里记录了insert的相关信息:
*** 2009-07-13 22:59:12.928 ksedmp: internal or fatal error ORA-00984: column not allowed here Current SQL statement for this session: insert into t values(a,1) ----- Call Stack Trace ----- calling call entry argument values in hex location type point (?means dubious value) -------------------- -------- -------------------- ---------------------------- ksedst()+27 all ksedst1() 0? 1 ?
复制
以下引用在ITPUB上的一个讨论,是使用ErrorStack跟踪解决问题的典型案例:
问题概述
在进行数据库例行维护时,遇到数据库正常导出,但导入时遇到有张表不能导入的问题。这张表存储工艺路线,大约共有160万条记录。表的不能导入影响一部分功能的实现,各单位反映强烈。
初步分析
首先排除是系统bug所带来的问题,因为前期导入导出未发现问题;另外由于是对同一数据库进行的导入导出,也排除了字符集问题。
单独再次导入此表,出现错误如下;
[oracle@zhx25 ~]$ imp system/ceshi fromuser=dba_mgr touser=dba_mgr tables=t_routdef ignore=y file=1.dmp Import: Release 10.2.0.1.0 - Production on Tue Mar 18 14:19:49 2008 Copyright (c) 1982, 2005, Oracle.All rights reserved. Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bit Production With the Partitioning, OLAP and Data Mining options Export file created by EXPORT:V10.02.01 via conventional path import done in ZHS16GBK character set and ZHS16GBK NCHAR character set . importing DBA_MGR's objects into DBA_MGR . . importing table "T_ROUTDEF" IMP-00058: ORACLE error 1438 encountered ORA-01438: value larger than specified precision allowed for this column IMP-00028: partial import of previous table rolled back: 20501 rows rolled back
复制
解决方向:有可能是数据中存在问题,很有可能是输入一些特殊字符,这些字符和ORACLE自身使用的通配符一致,所以导致系统不能识别。
问题解决
1、对问题进行跟踪
对系统设置1438 errorstack跟踪:
SQL> alter system set events='1438 trace name Errorstack forever,level 10'; System altered
复制
然后重新执行IMP;
2、查看UDUMP下日志,如下:
/opt/oracle/admin/orcl/udump/orcl_ora_32355.trc Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bit Production With the Partitioning, OLAP and Data Mining options ORACLE_HOME = /opt/oracle/product/10.2.0 System name:Linux Node name:zhx25.cccc.com Release:2.6.9-22.ELsmp Version:#1 SMP Mon Sep 19 18:00:54 EDT 2005 Machine:x86_64 Instance name: orcl Redo thread mounted by this instance: 1 Oracle process number: 17 Unix process pid: 32355, image: oracle@zhx25.cccc.com (TNS V1-V3) *** SERVICE NAMESYS$USERS) 2008-03-18 17:11:02.014 *** SESSION ID135.949) 2008-03-18 17:11:02.014 *** 2008-03-18 17:11:02.014 ksedmp: internal or fatal error ORA-01438: value larger than specified precision allowed for this column Current SQL statement for this session: INSERT /*+NESTED_TABLE_SET_REFS+*/ INTO "T_ROUTDEF" ("PRTNO", "SEQNO", "SEQDESC", "PRTEFLAG", "SEQCC", "SEQNXTNO", "SEQDEPT", "WKCCODE", "WKCALTCODE", "SEQSETHR", "LAB_SEQSETHR", "SEQRUNHR", "LAB_SEQRUNHR", "SEQQUEHR", "LAB_SEQQUEHR", "SEQTRANHR", "LAB_SEQTRANHR", "SEQNOQLTRAT", "SEQCHGDAT", "SEQCHGRSN", "SEQCHGCTLR", "MACHCODE", "MACHALTCODE", "RESCODE", "KEYSEQ", "SEQCRYLOT", "INITOVERRATE", "COMWORK", "SEQLOTFLAG", "SEQEFDAT", "SEQIEFDAT", "SEQEFLOTNO", "SEQIEFLOTNO", "SEQECONO", "PRTSEQCST", "SEQSELFCST", "INVYORN", "SEQQTY", "SEQNOTE", "NOPRTAVELZ", "OUTSEQFLAG", "WXCST", "SEQGSCHGCTLR", "SEQGSCHGDAT", "SEQSHCTL", "SEQSHDAT", "SEQSHSIGN", "PRICE", "ACCNO", "ROUSTAT", "FIXMFLAG", "BANCL" VALUES (:1, :2, :3, :4, :5, :6, :7, :8, :9, :10, :11, :12, :13, :14, :15, :16, :17, :18, :19, :20, :21, :22, :23, :24, :25, :26, :27, :28, :29, :30, :31, :32, :33, :34, :35, :36, :37, :38, :39, :40, :41, :42, :43, :44, :45, :46, :47, :48, :49, :50, :51, :52) ----- Call Stack Trace ----- calling call entry argument values in hex location type point (? means dubious value) ………… oacdef = 6a9477a0 Offsi = 48, Offsi = 1440 Opcode = 2 Bind Twotask Scalar Sql In (may be out) Copy oacdef = 6a9477d0 Offsi = 48, Offsi = 1472 Opcode = 2 Bind Twotask Scalar Sql In (may be out) Copy oacdef = 6a947800 Offsi = 48, Offsi = 1504 Opcode = 2 Bind Twotask Scalar Sql In (may be out) Copy oacdef = 6a947830 Offsi = 48, Offsi = 1536 Opcode = 2 Bind Twotask Scalar Sql In (may be out) Copy oacdef = 6a947860 Offsi = 48, Offsi = 1568 Opcode = 2 Bind Twotask Scalar Sql In (may be out) Copy oacdef = 6a947890 Offsi = 48, Offsi = 1600 Opcode = 2 Bind Twotask Scalar Sql In (may be out) Copy oacdef = 6a9478c0 Offsi = 48, Offsi = 1632 kkscoacd Bind#0 oacdty=01 mxl=32(24) mxlc=00 mal=00 scl=00 pre=00 oacflg=03 fl2=1000010 frm=01 csi=852 siz=3344 off=0 kxsbbbfp=2a974dc398 bln=32 avl=23 flg=05 value="? Bind#1 oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00 oacflg=03 fl2=1000000 frm=00 csi=00 siz=0 off=32 kxsbbbfp=2a974dc3b8 bln=22 avl=22 flg=01 value=### An invalid number has been seen.Memory contents are : Dump of memory from 0x0000002A974CDE88 to 0x0000002A974CDE9E 2A974CDE80 3331312E 0406C102 [.113....] 2A974CDE90 C9BBC6C8 52024D01 30310248 00003304[.....M.RH.10.3..] Bind#2 oacdty=01 mxl=32(30) mxlc=00 mal=00 scl=00 pre=00 oacflg=03 fl2=1000010 frm=01 csi=852 siz=0 off=56 kxsbbbfp=2a974dc3d0 bln=32 avl=00 flg=01 Bind#3 oacdty=01 mxl=32(01) mxlc=00 mal=00 scl=00 pre=00 oacflg=03 fl2=1000010 frm=01 csi=852 siz=0 off=88 kxsbbbfp=2a974dc3f0 bln=32 avl=01 flg=01 value="M" …………
复制
从跟踪文件可以看到,问题是系统认为地址为2A974CDE88的数值非法,在导入会提示ORA-1438错误。本值在数据库中定义为number(8)类型,但从结果来看,系统把这个值当成了number(24)类型,并且由于格式不对,不能取出值。
3、找到DMP中相关数据
用winhex打开dmp文件,搜索十六进制串“2E313133 02C10604 C8C6BBC9 014D0252 48023130 0433F3FF”(对trace文件中的值要进行顺序调整)
4、对导出数据进行更改
由于“2E313133 02C10604 C8C6BBC9 014D0252 48023130 0433F3FF”不是一个有效的number类型,需要对它进行调整。调整如下,只要确保调整的值是number类型的即可。
5、对数据库相关数据项的类型进行调整
把数据项的类型由number(*)改为number(38);
6、重新进行数据导入
[oracle@zhx25 ~]$ imp system/ceshi fromuser=dba_mgr touser=dba_mgr tables=t_routdef ignore=y file=1.dmp Import: Release 10.2.0.1.0 - Production on Tue Mar 18 14:19:49 2008 Copyright (c) 1982, 2005, Oracle.All rights reserved. Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bit Production With the Partitioning, OLAP and Data Mining options Export file created by EXPORT:V10.02.01 via conventional path import done in ZHS16GBK character set and ZHS16GBK NCHAR character set . importing DBA_MGR's objects into DBA_MGR . . importing table "T_ROUTDEF" 1646404
复制
本文通过介绍ErrorStack的应用,将这一常用的诊断工具介绍给大家,作为DBA,在遇到复杂问题时,通过这些手段深入分析,不仅有助于问题的解决,还将完善自己的知识体系,祝大家学习能够不断进步、深入。