背景
mysql定期归档到oracle12c任务,这个月突然卡住了,mysql 批量生成的insert 脚本执行时hang住了。脚本如下:
#!/bin/bash $ORACLE_HOME/bin/sql usern/pass >> imp.log <<eof set define off show user; @imp.sql commit; set time on; exit; eof
复制
问题分析
1、等待事件趋势
第一次排查hang后通过event入手
TO_CHAR(SAMPL SESSION_ID SESSION_SERIAL# EVENT STATUS BLOCKING_SESSION COUNT(*) ------------- ---------- --------------- ---------------------------------------------------------------- ------- ---------------- ---------- 2021-08-12 18 2003 9370 latch: shared pool WAITING 1429 1 2021-08-12 18 2003 9370 ON CPU 1912 2021-08-12 19 2003 9370 ON CPU 1833 2021-08-12 20 2003 9370 latch: shared pool WAITING 1706 1 2021-08-12 20 2003 9370 ON CPU 1912 2021-08-12 21 2003 9370 ON CPU 1880 2021-08-12 22 2003 9370 latch: shared pool WAITING 866 2 2021-08-12 22 2003 9370 ON CPU 1900 2021-08-12 23 2003 9370 ON CPU 1917 2021-08-13 00 2003 9370 SQL*Net break/reset to client WAITING 1 2021-08-13 00 2003 9370 ON CPU 1923 TO_CHAR(SAMPL SESSION_ID SESSION_SERIAL# EVENT STATUS BLOCKING_SESSION COUNT(*) ------------- ---------- --------------- ---------------------------------------------------------------- ------- ---------------- ---------- 2021-08-13 01 2003 9370 ON CPU 1864 2021-08-13 02 2003 9370 ON CPU 1844 2021-08-13 03 2003 9370 ON CPU 1897 2021-08-13 04 2003 9370 latch: shared pool WAITING 1431 1 2021-08-13 04 2003 9370 ON CPU 1841 2021-08-13 05 2003 9370 ON CPU 1910 2021-08-13 06 2003 9370 ON CPU 881 最后的blcking session 1431,12c⾃动捕获报告特性.但频率不是很高,时间也对应不上,凌晨6点卡住,最后block是凌晨5点 TO_CHAR(SAMPL SESSION_ID SESSION_SERIAL# EVENT STATUS BLOCKING_SESSION MODULE COUNT(*) ------------- ---------- --------------- ---------------------------------------------------------------- ------- ---------------- -------------------- ---------- 2021-08-12 15 1431 3223 ON CPU MMON_SLAVE 11 2021-08-12 15 1431 31349 ON CPU MMON_SLAVE 1 2021-08-12 16 1431 55435 ON CPU MMON_SLAVE 1 2021-08-12 16 1431 62320 ON CPU MMON_SLAVE 1 2021-08-12 16 1431 64971 ON CPU MMON_SLAVE 1 2021-08-12 17 1431 49301 ON CPU MMON_SLAVE 1 2021-08-12 17 1431 54968 ON CPU MMON_SLAVE 1 2021-08-12 23 1431 23088 ON CPU MMON_SLAVE 1 2021-08-12 23 1431 38171 ON CPU MMON_SLAVE 1 2021-08-13 02 1431 36114 ON CPU MMON_SLAVE 1 2021-08-13 02 1431 39168 ON CPU MMON_SLAVE 1 TO_CHAR(SAMPL SESSION_ID SESSION_SERIAL# EVENT STATUS BLOCKING_SESSION MODULE COUNT(*) ------------- ---------- --------------- ---------------------------------------------------------------- ------- ---------------- -------------------- ---------- 2021-08-13 04 1431 14682 ON CPU MMON_SLAVE 1 2021-08-13 04 1431 20053 ON CPU MMON_SLAVE 1 2021-08-13 05 1431 62271 ON CPU MMON_SLAVE 1 2021-08-13 09 1431 47165 ON CPU MMON_SLAVE 1
复制
2、锁信息
EVENT SQL_ID MODULE STATUS STATE BLOCKING_SESSION LOCKWAIT WAIT_TIME ------------------------------ -------------------- ------------------------------ -------- ------------------- ---------------- ---------------- ---------- SQL*Net message from client java@xxxx (TNS V1-V3) INACTIVE WAITING 从这看很奇怪,会话的等待为SQL*Net message from client,从lock看也没有blocking。 SID TYPE HOLD REQUEST ID1 ID2 CTIME BLOCK_OTHERS ---------- ------------ ------------ ------------ ---------- ---------- ---------- ---------------- 4 AE Share None 133 0 6964804 Not Blocking 12 AE Share None 133 0 57237 Not Blocking 864 AE Share None 133 0 71450 Not Blocking 866 AE Share None 133 0 2083 Not Blocking 1139 Redo Thread Exclusive None 1 0 6964808 Not Blocking 1141 AE Share None 133 0 64226 Not Blocking 1423 XR Null None 4 0 6964813 Not Blocking 1423 RD Null None 1 0 6964813 Not Blocking 1423 Control File Row-S (SS) None 0 0 6964813 Not Blocking 1423 RS Row-S (SS) None 25 1 6964808 Not Blocking 1429 AE Share None 133 0 74523 Not Blocking 1706 KT Share None 18974 0 32728 Not Blocking 1706 KD Exclusive None 0 0 6964805 Not Blocking 1717 AE Share None 133 0 0 Not Blocking 1991 Temp Segment Row-X (SX) None 3 1 6964805 Not Blocking 2003 AE Share None 133 0 82454 Not Blocking 2003 DML Row-X (SX) None 328259 0 82451 Not Blocking 2003 Transaction Exclusive None 327704 292546 82451 Not Blocking
复制
3、分析10046 trace ,截取部分内容如下:
SO: 0xb1118e88, type: 109, owner: 0xee4b9740, flag: INIT/-/-/-/0x00 if: 0x3 c: 0x3 proc=0xed7414c8, name=LIBRARY OBJECT LOCK, file=kgl.h LINE:9855, pg=0, conuid=0 LibraryObjectLock: Address=0xb1118e88 Handle=0x89c36e20 Mode=N CanBeBrokenCount=1 Incarnation=1 ExecutionCount=0 Context=0x7f04ca789660 User=0xee4b9740 Session=0xee4b9740 ReferenceCount=1 Flags=[0000] SavepointNum=0 Time=08/13/2021 06:28:25 LibraryHandle: Address=0x89c36e20 Hash=0 LockMode=N PinMode=0 LoadLockMode=0 Status=VALD Name: Namespace=SQL AREA(00) Type=CURSOR(00) ContainerId=0 Statistics: InvalidationCount=0 ExecutionCount=1 LoadCount=1 ActiveLocks=1 TotalLockCount=1 TotalPinCount=2 Counters: BrokenCount=1 RevocablePointer=1 KeepDependency=0 Version=0 BucketInUse=0 HandleInUse=0 HandleReferenceCount=0 Concurrency: DependencyMutex=0x89c36ed0(0, 0, 0, 0) Mutex=0x7b9e2598(0, 73, 0, 0) Flags=RON/PIN/PN0/EXP/CHD/[10012111] Flags2=[0000] WaitersLists: Lock=0x89c36eb0[0x89c36eb0,0x89c36eb0] Pin=0x89c36e90[0x89c36e90,0x89c36e90] LoadLock=0x89c36f08[0x89c36f08,0x89c36f08] LibraryObject: Address=0x8bbc4a28 HeapMask=0000-0001-0001-0000 Flags=EXS[0000] Flags2=[0000] PublicFlags=[0000] NamespaceDump: Child Cursor: Heap0=0x8bbc4af8 Heap6=0xabc80400 Heap0 Load Time=08-13-2021 06:28:25 Heap6 Load Time=08-13-2021 06:28:25 ---------------------------------------- SO: 0xc3005ff8, type: 109, owner: 0xee4b9740, flag: INIT/-/-/-/0x00 if: 0x3 c: 0x3 proc=0xed7414c8, name=LIBRARY OBJECT LOCK, file=kgl.h LINE:9855, pg=0, conuid=0 LibraryObjectLock: Address=0xc3005ff8 Handle=0x7b9e2448 Mode=N CanBeBrokenCount=1 Incarnation=1 ExecutionCount=1 User=0xee4b9740 Session=0xee4b9740 ReferenceCount=1 Flags=CNB/[0001] SavepointNum=1a1607ca Time=08/13/2021 06:28:25 LibraryHandle: Address=0x7b9e2448 Hash=51d22eff LockMode=N PinMode=0 LoadLockMode=0 Status=VALD ObjectName: Name=INSERT INTO table—name VALUES (xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx) 10046的trace中并没有发现异常,在对应insert hang的时间点发现最后执行的insert 语句也是正常的。 FullHashValue=b27cdaeb5a2603c11dfc564551d22eff Namespace=SQL AREA(00) Type=CURSOR(00) ContainerId=0 ContainerUid=0 Identifier=1372729087 OwnerIdn=80 Statistics: InvalidationCount=0 ExecutionCount=1 LoadCount=2 ActiveLocks=1 TotalLockCount=1 TotalPinCount=1 Counters: BrokenCount=1 RevocablePointer=1 KeepDependency=1 Version=0 BucketInUse=0 HandleInUse=0 HandleReferenceCount=0 Concurrency: DependencyMutex=0x7b9e24f8(0, 7, 0, 0) Mutex=0x7b9e2598(0, 73, 0, 0) Flags=RON/PIN/TIM/PN0/DBN/[10012841] Flags2=[0000] WaitersLists: Lock=0x7b9e24d8[0x7b9e24d8,0x7b9e24d8] Pin=0x7b9e24b8[0x7b9e24b8,0x7b9e24b8] LoadLock=0x7b9e2530[0x7b9e2530,0x7b9e2530] Timestamp: Current=08-13-2021 06:28:25 HandleReference: Address=0x7b9e26f0 Handle=(nil) Flags=[00] LibraryObject: Address=0xb3bae630 HeapMask=0000-0001-0001-0000 Flags=EXS[0000] Flags2=[0000] PublicFlags=[0000] ChildTable: size='16' Child: id='0' Table=0xb3baf4b0 Reference=0xb3baefa0 Handle=0x89c36e20 NamespaceDump: Parent Cursor: sql_id=1vz2q8p8x4brz parent=0xb3bae700 maxchild=1 plk=y ppn=n prsfcnt=0 obscnt=0
复制
处理思路
好像没有头绪了,操作系统信息,alert日志等也没有异常。多次测试都在凌晨4 5点发生异常,怀疑是不是这个时间点受oracle job问题。做了如下操作尝试:
禁用sql tuning 与space advisor exrc dbms_auto_task_admin.disable('auto space advisor',NULL,NULL); exec dbms_auto_task_admin.disable('sql tuning advisor',NULL,NULL); 并锁住导入表,避免收集统计信息 EXEC DBMS_STATS.delete_table_stats('<owner name>','<table name>'); EXEC DBMS_STATS.LOCK_TABLE_STATS ('owner name', 'table name');
复制
再次尝试仍是相同问题。并且测试换个时间段在白天也复现了。继续返回trace文件分析。上述trace中记录到的最后一条insert 语句是正常的,分析批量脚本。
grep '001' imp.sql -C 3 截取部分数据,001是trace中记录到的最后insert。 INSERT INTO tf_b_sms_receive_202009 VALUES ('001'); 001的下一行002 乱码与\' \0 INSERT INTO tf_b_sms_receive_202009 VALUES ('002','�\0�Ip�x(\'\0\0\0\0\0.��\0v�����C����6ўz��\0��]b\".۰F N�О��^#q�','�\0�Ip�x(\'\0\0\0\0\0.��\0v�����C����6ўz��\0��]b\".۰F N�О��^#q�');
复制
直接删除这条语句再试,成功导入。
模拟分析
删除后虽然跑成功了,但还要细看是什么原因导致的,如何避免,几百万行导入不可能从头排查。
$ORACLE_HOME/bin/sql t1/t1 > /home/oracle/test/imp.log <<eof set define off show user; alter session set max_dump_file_size = unlimited; alter session set events '10046 trace name context forever, level 12'; @/home/oracle/test/hang.sql commit; set time on; exit; eof
复制
测试环境执行脚本导入并生成10046 trace分析,结果和上述trace相同,仍然到问题行就hang住,且trace中没有记录解析此sql任何信息。
使用sql developer 图形化工具单独执行报错ORA-01756。
'�\0�Ip�x(\'\0\0\0\0\0.��\0v�����C����6ўz��\0��]b\".۰F N�О��^#q�'
复制
尝试将’替换为\。再次执行通过。这其实也是调用$ORACLE_HOME/bin/sql的原因。使用sqlplus是直接报错跳过的,但sql支持更长的insert字符,并且其他的符号兼容相对较好,各有利弊吧。
预防其实很简单,导入前批量替换 \’。如担心误伤,可以通过grep -n 将带\'的sql先抓出来,然后sed -i 删除指定行。
最后修改时间:2021-08-24 15:50:49
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。
评论
我还是看到了 \0
3年前

1
相关阅读
Oracle DataGuard高可用性解决方案详解
孙莹
558次阅读
2025-03-26 23:27:33
Oracle RAC 一键安装翻车?手把手教你如何排错!
Lucifer三思而后行
518次阅读
2025-04-15 17:24:06
【纯干货】Oracle 19C RU 19.27 发布,如何快速升级和安装?
Lucifer三思而后行
422次阅读
2025-04-18 14:18:38
XTTS跨版本迁移升级方案(11g to 19c RAC for Linux)
zwtian
420次阅读
2025-04-08 09:12:48
墨天轮个人数说知识点合集
JiekeXu
420次阅读
2025-04-01 15:56:03
Oracle SQL 执行计划分析与优化指南
Digital Observer
416次阅读
2025-04-01 11:08:44
Oracle数据库一键巡检并生成HTML结果,免费脚本速来下载!
陈举超
381次阅读
2025-04-20 10:07:02
Oracle 19c RAC更换IP实战,运维必看!
szrsu
360次阅读
2025-04-08 23:57:08
【活动】分享你的压箱底干货文档,三篇解锁进阶奖励!
墨天轮编辑部
335次阅读
2025-04-17 17:02:24
oracle定时任务常用攻略
virvle
325次阅读
2025-03-25 16:05:19