暂无图片
暂无图片
7
暂无图片
暂无图片
1
暂无图片

震惊:祸不单行(空间满外加大事务)导致OGG噶了

656

一、背景描述

近期有朋友联系我,说他们一套Oracle 11g OGG 同步出现了问题,目标端OGG不能正常同步源端数据。源端和目标端都是Oracle 11g 单机库,让帮忙分析排查下问题。

由于挺长时间没有处理过OGG问题,现将操作记录整理。

如有不妥之处,还望对此熟悉的朋友能多多指正,不胜感激。

二、排查过程

通过朋友提供的服务器登录信息,分别登录了OGG 源端和目标端。

2.1 OGG源端

登录到OGG 源端,首先查看了下 OGG 进程状态,显示如下:

[oracle@xxxxxxdb1 ogg]$ ./ggsci Oracle GoldenGate Command Interpreter for Oracle Version 11.2.1.0.33 22986205 OGGCORE_11.2.1.0.0OGGBP_PLATFORMS_160502.0200_FBO Linux, x64, 64bit (optimized), Oracle 11g on May 2 2016 06:30:42 Copyright (C) 1995, 2016, Oracle and/or its affiliates. All rights reserved. GGSCI (xxxxxxdb1) 1> info all Program Status Group Lag at Chkpt Time Since Chkpt MANAGER STOPPED EXTRACT ABENDED DPA 00:00:00 09:08:51 EXTRACT ABENDED EXA 00:16:32 09:08:52
复制

显示源端MGR管理进程、EXA抽取进程和DPA投递进程都停了。

接下来查看下OGG源端ggserr.log日志,看看什么原因导致源端进程宕了,查询到因为磁盘空间不足导致OGG源端同步进程宕,日志信息如下:

2024-04-11 23:07:21 INFO OGG-10028 Oracle GoldenGate Capture for Oracle, exa.prm: The DELETE TRANDATA command failed for the specified table, error [30,568], operation [ALTER TABLE "XXXX"."XXXX_ERP_INFO" DROP SUPPLEMENTAL LOG GROUP "GGS_98502" /* GOLDENGATE_DDL_REPLICATION */ (size 117)]. 2024-04-11 23:07:21 INFO OGG-10028 Oracle GoldenGate Capture for Oracle, exa.prm: The DELETE TRANDATA command failed for the specified table, error [30,568], operation [ALTER TABLE "XXXX"."XXXX_ERP_INFO" DROP SUPPLEMENTAL LOG GROUP "GGT_98502" /* GOLDENGATE_DDL_REPLICATION */ (size 117)]. 2024-04-11 23:07:21 INFO OGG-01054 Oracle GoldenGate Capture for Oracle, exa.prm: Recovery completed for target file ./dirdat/ea000156, at RBA 1413, CSN 537361183. 2024-04-11 23:07:21 INFO OGG-01057 Oracle GoldenGate Capture for Oracle, exa.prm: Recovery completed for all targets. 2024-04-11 23:07:21 INFO OGG-01487 Oracle GoldenGate Capture for Oracle, exa.prm: DDL found, operation [ ALTER TABLE XXXX_EMR_STAT ADD CONTROLID_TEMP VARCHAR2(10) (size 65)], start SCN [537361178], commit SCN [537361196] instance [orcl (1)], DDL seqno [37111], marker seqno [76481]. 2024-04-11 23:07:21 INFO OGG-00487 Oracle GoldenGate Capture for Oracle, exa.prm: DDL operation included [INCLUDE ALL], optype [ALTER], objtype [TABLE], objowner [XXXX], objname [XXXX_EMR_STAT]. 2024-04-11 23:07:21 INFO OGG-00497 Oracle GoldenGate Capture for Oracle, exa.prm: Writing DDL operation to extract trail file. 2024-04-11 23:07:21 INFO OGG-10028 Oracle GoldenGate Capture for Oracle, exa.prm: The DELETE TRANDATA command failed for the specified table, error [30,568], operation [ALTER TABLE "XXXX"."XXXX_EMR_STAT" DROP SUPPLEMENTAL LOG GROUP "GGT_98527" /* GOLDENGATE_DDL_REPLICATION */ (size 114)]. 2024-04-11 23:07:21 INFO OGG-00477 Oracle GoldenGate Capture for Oracle, exa.prm: Successfully added TRAN DATA for table [XXXX.XXXX_EMR_STAT], operation [ALTER TABLE "XXXX"."XXXX_EMR_STAT" ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS /* GOLDENGATE_DDL_REPLICATION */ (size 114)]. 2024-04-11 23:07:21 INFO OGG-00479 Oracle GoldenGate Capture for Oracle, exa.prm: Successfully deleted TRAN DATA for [ALTER] DDL operation, table [XXXX.XXXX_EMR_STAT], operation [ALTER TABLE "XXXX"."XXXX_EMR_STAT" DROP SUPPLEMENTAL LOG GROUP "GGS_98527" /* GOLDENGATE_DDL_REPLICATION */ (size 114)]. 2024-04-11 23:07:30 WARNING OGG-00869 Oracle GoldenGate Capture for Oracle, exa.prm: No unique key is defined for table 'XXXX_INFO_HIS'. All viable columns will be used to represent the key, but may not guarantee uniqueness. KEYCOLS may be used to define the key. 2024-04-11 23:09:05 WARNING OGG-00869 Oracle GoldenGate Capture for Oracle, exa.prm: No unique key is defined for table 'DATA_XXXX_EC'. All viable columns will be used to represent the key, but may not guarantee uniqueness. KEYCOLS may be used to define the key. 2024-04-11 23:09:23 INFO OGG-01026 Oracle GoldenGate Capture for Oracle, dpa.prm: Rolling over remote file ./dirdat/ra000146. 2024-04-11 23:09:59 INFO OGG-01026 Oracle GoldenGate Capture for Oracle, exa.prm: Rolling over remote file ./dirdat/ea000156. 2024-04-11 23:10:15 INFO OGG-01026 Oracle GoldenGate Capture for Oracle, dpa.prm: Rolling over remote file ./dirdat/ra000147. 2024-04-11 23:10:38 ERROR OGG-01096 Oracle GoldenGate Capture for Oracle, exa.prm: Unable to write to file "./dirdat/ea000157" (error 28, No space left on device). 2024-04-11 23:10:38 ERROR OGG-01668 Oracle GoldenGate Capture for Oracle, exa.prm: PROCESS ABENDING. 2024-04-11 23:14:14 INFO OGG-00975 Oracle GoldenGate Manager for Oracle, mgr.prm: EXTRACT EXA starting. 2024-04-11 23:14:14 INFO OGG-00965 Oracle GoldenGate Manager for Oracle, mgr.prm: EXTRACT EXA restarted automatically. 2024-04-11 23:14:14 INFO OGG-00992 Oracle GoldenGate Capture for Oracle, exa.prm: EXTRACT EXA starting. 2024-04-11 23:14:14 INFO OGG-03035 Oracle GoldenGate Capture for Oracle, exa.prm: Operating system character set identified as UTF-8. Locale: en_US, LC_ALL:. 2024-04-11 23:14:14 INFO OGG-02696 Oracle GoldenGate Capture for Oracle, exa.prm: NON-ANSI SQL parameter syntax is used for parameter parsing. 2024-04-11 23:14:14 INFO OGG-02095 Oracle GoldenGate Capture for Oracle, exa.prm: Successfully set environment variable ORACLE_SID=orcl. 2024-04-11 23:14:14 INFO OGG-02095 Oracle GoldenGate Capture for Oracle, exa.prm: Successfully set environment variable NLS_LANG=AMERICAN_AMERICA.ZHS16GBK. 2024-04-11 23:14:14 INFO OGG-02095 Oracle GoldenGate Capture for Oracle, exa.prm: Successfully set environment variable ORACLE_HOME=/u01/oracle/product/11.2.0/db_1. 2024-04-11 23:14:14 WARNING OGG-00363 Oracle GoldenGate Capture for Oracle, exa.prm: NUMFILES must occur before SOURCEDEFS/TARGETDEFS entries to have an effect. Parameter ignored. 2024-04-11 23:14:14 INFO OGG-01815 Oracle GoldenGate Capture for Oracle, exa.prm: Virtual Memory Facilities for: BR anon alloc: mmap(MAP_ANON) anon free: munmap file alloc: mmap(MAP_SHARED) file free: munmap target directories: /data/ogg/BR/EXA. 2024-04-11 23:14:14 INFO OGG-01815 Oracle GoldenGate Capture for Oracle, exa.prm: Virtual Memory Facilities for: COM anon alloc: mmap(MAP_ANON) anon free: munmap file alloc: mmap(MAP_SHARED) file free: munmap target directories: /data/ogg/dirtmp. 2024-04-11 23:24:14 INFO OGG-00975 Oracle GoldenGate Manager for Oracle, mgr.prm: EXTRACT EXA starting. 2024-04-11 23:24:14 INFO OGG-00965 Oracle GoldenGate Manager for Oracle, mgr.prm: EXTRACT EXA restarted automatically. 2024-04-11 23:24:15 INFO OGG-00992 Oracle GoldenGate Capture for Oracle, exa.prm: EXTRACT EXA starting. 2024-04-11 23:24:15 INFO OGG-03035 Oracle GoldenGate Capture for Oracle, exa.prm: Operating system character set identified as UTF-8. Locale: en_US, LC_ALL:. 2024-04-11 23:24:15 INFO OGG-02696 Oracle GoldenGate Capture for Oracle, exa.prm: NON-ANSI SQL parameter syntax is used for parameter parsing. 2024-04-11 23:24:15 ERROR OGG-00664 Oracle GoldenGate Capture for Oracle, exa.prm: OCI Error beginning session (status = 257-ORA-00257: archiver error. Connect internal only, until freed.). 2024-04-11 23:24:15 ERROR OGG-01668 Oracle GoldenGate Capture for Oracle, exa.prm: PROCESS ABENDING. 2024-04-11 23:28:15 INFO OGG-00975 Oracle GoldenGate Manager for Oracle, mgr.prm: EXTRACT EXA starting. 2024-04-11 23:28:15 INFO OGG-00965 Oracle GoldenGate Manager for Oracle, mgr.prm: EXTRACT EXA restarted automatically. 2024-04-11 23:28:15 INFO OGG-00992 Oracle GoldenGate Capture for Oracle, exa.prm: EXTRACT EXA starting. 2024-04-11 23:28:15 INFO OGG-03035 Oracle GoldenGate Capture for Oracle, exa.prm: Operating system character set identified as UTF-8. Locale: en_US, LC_ALL:. 2024-04-11 23:28:15 INFO OGG-02696 Oracle GoldenGate Capture for Oracle, exa.prm: NON-ANSI SQL parameter syntax is used for parameter parsing. 2024-04-11 23:28:15 ERROR OGG-00664 Oracle GoldenGate Capture for Oracle, exa.prm: OCI Error beginning session (status = 257-ORA-00257: archiver error. Connect internal only, until freed.).
复制

根据日志内容,显示在2024-04-11 23:10:38 提示磁盘空间不足,导致进程宕,抽取进程多次尝试重复拉起最后失败。

源端报错已经很明确了,因为空间不足导致源端进程宕,那就查询下哪些文件占用占用了大量磁盘空间,是否可以删除,通过df -h查看到的信息如下:

[oracle@xxxxxxdb1 ~]$ df -h Filesystem Size Used Avail Use% Mounted on devtmpfs 16G 0 16G 0% /dev tmpfs 16G 0 16G 0% /dev/shm tmpfs 16G 8.7M 16G 1% /run tmpfs 16G 0 16G 0% /sys/fs/cgroup /dev/mapper/centos-root 35G 16G 20G 44% / /dev/vda1 1014M 157M 858M 16% /boot /dev/vdb 500G 63G 438G 13% /backup /dev/vda3 158G 150G 4.0K 100% /data tmpfs 3.2G 0 3.2G 0% /run/user/1001 tmpfs 3.2G 0 3.2G 0% /run/user/0
复制

通过上面显示/data目录空间已满,当前OGG部署在/data/ogg目录下,将/data目录下一些备份文件和日志文件进行删除,将/data目录空间降了下来。

然后重新启动源端OGG相应进程,查看了下当前mgr 配置文件中文件保留6天,文件都未丢,启动mgr管理进程和exa抽取进程都很顺利,但投递进程启动后不久就ABENDED,信息如下:

GGSCI (xxxxxxdb1) 2> start mgr Manager started. GGSCI (xxxxxxdb1) 3> info all Program Status Group Lag at Chkpt Time Since Chkpt MANAGER RUNNING EXTRACT ABENDED DPA 00:00:00 09:09:04 EXTRACT ABENDED EXA 00:16:32 09:09:05 GGSCI (xxxxxxdb1) 4> info all Program Status Group Lag at Chkpt Time Since Chkpt MANAGER RUNNING EXTRACT ABENDED DPA 00:00:00 09:09:06 EXTRACT ABENDED EXA 00:16:32 09:09:07 GGSCI (xxxxxxdb1) 5> dblogin userid ogg,password xxxx Successfully logged into database. GGSCI (xxxxxxdb1) 6> start exa Sending START request to MANAGER ... EXTRACT EXA starting GGSCI (xxxxxxdb1) 7> info all Program Status Group Lag at Chkpt Time Since Chkpt MANAGER RUNNING EXTRACT ABENDED DPA 00:00:00 09:10:07 EXTRACT RUNNING EXA 09:26:33 00:00:07 GGSCI (xxxxxxdb1) 8> start dpa Sending START request to MANAGER ... EXTRACT DPA starting GGSCI (xxxxxxdb1) 12> info all Program Status Group Lag at Chkpt Time Since Chkpt MANAGER RUNNING EXTRACT RUNNING DPA 00:00:00 00:00:36 EXTRACT RUNNING EXA 09:27:03 00:00:37 GGSCI (xxxxxxdb1) 17> info all Program Status Group Lag at Chkpt Time Since Chkpt MANAGER RUNNING EXTRACT RUNNING DPA 09:17:20 00:00:09 EXTRACT RUNNING EXA 09:17:12 00:00:03 GGSCI (xxxxxxdb1) 19> info all Program Status Group Lag at Chkpt Time Since Chkpt MANAGER RUNNING EXTRACT RUNNING DPA 00:00:00 00:00:05 EXTRACT ABENDED EXA 09:16:56 00:01:23
复制

是什么原因投递进程失败呢,莫非目标端MGR进程也宕了,空间也满了,带着这种疑问我又查看了下目标端OGG。

2.2 OGG目标端

登录目标端,我首先查看了下目标端磁盘空间,显示目标端 / 根目录空间已满,信息如下:

[root@xxxxxxxx2 dpdata]# df -h Filesystem Size Used Avail Use% Mounted on devtmpfs 7.8G 0 7.8G 0% /dev tmpfs 7.9G 0 7.9G 0% /dev/shm tmpfs 7.9G 129M 7.7G 2% /run tmpfs 7.9G 0 7.9G 0% /sys/fs/cgroup /dev/mapper/centos-root 35G 35G 2.0M 100% / /dev/vda1 1014M 155M 860M 16% /boot /dev/vda3 453G 167G 264G 39% /data tmpfs 1.6G 0 1.6G 0% /run/user/1001 tmpfs 1.6G 0 1.6G 0% /run/user/
复制

Oracle数据库安装在 / 根目录 /u01下,于是逐级排查了下占用根目录的文件,发现监听日志log_xxx.xml文件以及$ORACLE_BASE/trace下的ggs_ddl_trace.log 占用比较大,于是将相应的文件删除,将根目录磁盘空间降了下来,ggs_ddl_trace.log 这个文件记录了很多DDL日志,如下所示:

[root@xxxxxxxx2 trace]# more ggs_ddl_trace.log SESS 4294967295-2023-09-07 21:53:45 : DDL : Initial setup starting SESS 4294967295-2023-09-07 21:53:46 : DDL : STARTING DDL REPLICATION SETUP SESS 4294967295-2023-09-07 21:53:47 : DDL : Instantiating new DDL replication package SESS 4294967295-2023-09-07 21:53:57 : DDL : ************************* Start of log for DDL sequence [1], v[ $Id: ddl_setup.sql /st_oggco re_11.2.1/38 2015/05/18 05:12:21 sijenki Exp $ ] trace level [0], owner schema of DDL package [OGG], objtype [ROLE] name [.GGS_GGSUSER_R OLE] SESS 4294967295-2023-09-07 21:53:57 : DDLTRACE1 : Before Trigger: point in execution = [1.0], objtype [ROLE] name [.GGS_GGSUSER_ROLE] SESS 4294967295-2023-09-07 21:53:57 : DDL : DDL operation [DROP ROLE GGS_GGSUSER_ROLE ], sequence [1], DDL type [DROP] ROLE, real object type [ROLE], validity [], object ID [], object [.GGS_GGSUSER_ROLE], real object [.GGS_GGSUSER_ROLE], base object schema [], base object name [], logged as [SYS] SESS 4294967295-2023-09-07 21:53:57 : DDL : Start SCN found [1226155] SESS 4294967295-2023-09-07 21:53:57 : DDL : ------------------------- End of log for DDL sequence [1], no DDL history metadata recorded for this DDL operation SESS 4294967295-2023-09-07 21:53:57 : DDL : ************************* Start of log for DDL sequence [2], v[ $Id: ddl_setup.sql /st_oggco re_11.2.1/38 2015/05/18 05:12:21 sijenki Exp $ ] trace level [0], owner schema of DDL package [OGG], objtype [ROLE] name [.GGS_GGSUSER_R OLE] SESS 4294967295-2023-09-07 21:53:57 : DDLTRACE1 : Before Trigger: point in execution = [1.0], objtype [ROLE] name [.GGS_GGSUSER_ROLE] SESS 4294967295-2023-09-07 21:53:57 : DDL : DDL operation [CREATE ROLE GGS_GGSUSER_ROLE ], sequence [2], DDL type [CREATE] ROLE, real ob ject type [ROLE], validity [], object ID [], object [.GGS_GGSUSER_ROLE], real object [.GGS_GGSUSER_ROLE], base object schema [], base ob ject name [], logged as [SYS] SESS 4294967295-2023-09-07 21:53:57 : DDL : Start SCN found [1226168] SESS 4294967295-2023-09-07 21:53:57 : DDL : ------------------------- End of log for DDL sequence [2], no DDL history metadata recorded for this DDL operation SESS 4294967295-2023-09-07 21:53:57 : DDL : ************************* Start of log for DDL sequence [3], v[ $Id: ddl_setup.sql /st_oggco re_11.2.1/38 2015/05/18 05:12:21 sijenki Exp $ ] trace level [0], owner schema of DDL package [OGG], objtype [OBJECT PRIVILEGE] name [OG G.GGS_MARKER] SESS 4294967295-2023-09-07 21:53:57 : DDLTRACE1 : Before Trigger: point in execution = [1.0], objtype [OBJECT PRIVILEGE] name [OGG.GGS_M ARKER] SESS 4294967295-2023-09-07 21:53:57 : DDL : DDL operation [GRANT SELECT, DELETE ON "OGG"."GGS_MARKER" TO GGS_GGSUSER_ROLE ], sequence [3 ], DDL type [GRANT] OBJECT PRIVILEGE, real object type [OBJECT PRIVILEGE], validity [], object ID [], object [OGG.GGS_MARKER], real obje ct [OGG.GGS_MARKER], base object schema [], base object name [], logged as [SYS] SESS 4294967295-2023-09-07 21:53:57 : DDL : Start SCN found [1226180] SESS 4294967295-2023-09-07 21:53:57 : DDL : ------------------------- End of log for DDL sequence [3], no DDL history metadata recorded for this DDL operation
复制

将根目录空间降下来后,查看了下目标端OGG进程,显示如下:

[oracle@xxxxxxxx2 ogg]$ ./ggsci Oracle GoldenGate Command Interpreter for Oracle Version 11.2.1.0.33 22986205 OGGCORE_11.2.1.0.0OGGBP_PLATFORMS_160502.0200_FBO Linux, x64, 64bit (optimized), Oracle 11g on May 2 2016 06:30:42 Copyright (C) 1995, 2016, Oracle and/or its affiliates. All rights reserved. GGSCI (xxxxxxxx2) 1> info all Program Status Group Lag at Chkpt Time Since Chkpt MANAGER RUNNING REPLICAT RUNNING RTA 06:20:18 03:03:13
复制

目标端RTA应用进程虽然现实状态为RUNNING,但其实已经不能正常应用日志,于是尝试启停RTA应用进程,如下所示。

[oracle@xxxxxxxx2 ogg]$ ./ggsci Oracle GoldenGate Command Interpreter for Oracle Version 11.2.1.0.33 22986205 OGGCORE_11.2.1.0.0OGGBP_PLATFORMS_160502.0200_FBO Linux, x64, 64bit (optimized), Oracle 11g on May 2 2016 06:30:42 Copyright (C) 1995, 2016, Oracle and/or its affiliates. All rights reserved. GGSCI (xxxxxxxx2) 1> info all Program Status Group Lag at Chkpt Time Since Chkpt MANAGER RUNNING REPLICAT RUNNING RTA 06:20:18 03:03:13 GGSCI (xxxxxxxx2) 5> dblogin userid ogg,password xxxx Successfully logged into database. GGSCI (xxxxxxxx2) 6> stop rta Sending STOP request to REPLICAT RTA ... STOP request pending end-of-transaction (6126751 records so far).. GGSCI (xxxxxxxx2) 7> info all Program Status Group Lag at Chkpt Time Since Chkpt MANAGER RUNNING REPLICAT RUNNING RTA 06:20:18 03:07:48 GGSCI (xxxxxxxx2) 22> send rta,status Sending STATUS request to REPLICAT RTA ... Current status: Processing data Sequence #: 147 RBA: 614612071 6408889 records in current transaction PENDING STOP request pending end-of-transaction (6408889 records so far) GGSCI (xxxxxxxx2) 23> kill rta Sending KILL request to MANAGER ... Killed process (179559) for REPLICAT RTA GGSCI (xxxxxxxx2) 24> info all Program Status Group Lag at Chkpt Time Since Chkpt MANAGER RUNNING REPLICAT ABENDED RTA 06:20:18 03:21:16 GGSCI (xxxxxxxx2) 25> info lal ERROR: Invalid command. GGSCI (xxxxxxxx2) 26> info all Program Status Group Lag at Chkpt Time Since Chkpt MANAGER RUNNING REPLICAT ABENDED RTA 06:20:18 03:21:19 GGSCI (xxxxxxxx2) 27> stop rta REPLICAT RTA is already stopped. GGSCI (xxxxxxxx2) 28> dblogin userid ogg,password ogg Successfully logged into database. GGSCI (xxxxxxxx2) 29> start rta Sending START request to MANAGER ... REPLICAT RTA starting GGSCI (xxxxxxxx2) 30> info all Program Status Group Lag at Chkpt Time Since Chkpt MANAGER RUNNING REPLICAT RUNNING RTA 06:20:18 03:22:11 GGSCI (xxxxxxxx2) 31> info all Program Status Group Lag at Chkpt Time Since Chkpt MANAGER RUNNING REPLICAT RUNNING RTA 06:20:18 03:22:13 GGSCI (xxxxxxxx2) 32> info all Program Status Group Lag at Chkpt Time Since Chkpt MANAGER RUNNING REPLICAT RUNNING RTA 06:20:18 03:22:14
复制

通过上面操作显示使用stop RTA无法停止应用进程,显示有事务存在,于是kill掉RTA进程,重启,虽然RTA显示启动,但观察Lag at Chkpt并无改变。

于是查询导致目标端应用进程异常的大事务信息,通过如下SQL语句查询大事务:

SQL> with ltr as ( 2 select to_char(sysdate,'YYYYMMDDHH24MISS') TM, 3 s.sid, 4 s.sql_id, 5 s.sql_child_number, 6 s.prev_sql_id, 7 xid, 8 to_char(t.start_date,'YYYYMMDDHH24MISS') start_time, 9 e.TYPE,e.block, 10 e.ctime, 11 decode(e.CTIME, 0, (sysdate - t.start_date) * 3600*24, e.ctime) el_second 12 -- q.sql_text 13 from v$transaction t, v$session s,v$transaction_enqueue e 14 where t.start_date <= sysdate - interval '100' second 15 and t.addr = s.taddr 16 --and s.sql_child_number = q.CHILD_NUMBER(+) 17 --and s.sql_id = q.sql_id(+) and s.prev_sql_id = q.sql_id(+) 18 and t.addr = e.addr(+) ) 19 select ltr.* , (select q1.sql_text from v$sql q1 where ltr.prev_sql_id = q1.sql_id(+) 20 and rownum = 1) prev_sql_text , 21 (select q1.sql_text from v$sql q1 where ltr.sql_id = q1.sql_id(+) 22 and ltr.sql_child_number = q1.CHILD_NUMBER(+)) sql_text 23 from ltr ltr; 20240412084328 1141 0cd4rf8jw6a86 1 0cd4rf8jw6a86 03000D0076C60300 20240412052434 TX 0 11933 11933 UPDATE /*+ RESTRICT_ALL_REF_CONS */ "XXXX"."XXX_EMR_STAT" SET "ID" = :a8,"CONTROLID" = :a9,"CALRECORDSID" = :a10,"CONTROLVAL" = :a11,"CONTROLPARENTID" = :a12,"CONTROLTEXT" = :a13,"PROGRAMID" = :a14,"CONTROLID_TEMP" = :a15 WHERE "ID" = :b0 AND "CONTROLID" = :b1 AND "CALRECORDSID" = :b2 AND "CONTROLVAL" is NULL AND "CONTROLPARENTID" = :b4 AND "CONTROLTEXT" is NULL AND "PROGRAMID" = :b6 AND "CONTROLID_TEMP" is NULL AND ROWNUM = 1 UPDATE /*+ RESTRICT_ALL_REF_CONS */ "XXXX"."XXX_EMR_STAT" SET "ID" = :a8,"CONTROLID" = :a9,"CALRECORDSID" = :a10,"CONTROLVAL" = :a11,"CONTROLPARENTID" = :a12,"CONTROLTEXT" = :a13,"PROGRAMID" = :a14,"CONTROLID_TEMP" = :a15 WHERE "ID" = :b0 AND "CONTROLID" = :b1 AND "CALRECORDSID" = :b2 AND "CONTROLVAL" is NULL AND "CONTROLPARENTID" = :b4 AND "CONTROLTEXT" is NULL AND "PROGRAMID" = :b6 AND "CONTROLID_TEMP" is NULL AND ROWNUM = 1
复制

查询下导致该大事务的表结构信息,如下所示。

SQL> select dbms_metadata.get_ddl('TABLE','XXX_EMR_STAT','XXXX') FROM DUAL; CREATE TABLE "XXXX"."XXX_EMR_STAT" ( "ID" NUMBER(8,0), "CONTROLID" VARCHAR2(10), "CORDSID" NUMBER(8,0) NOT NULL ENABLE, "CONTROLVAL" NUMBER(8,0), "CONTROLPARENTID" VARCHAR2(200), "CONTROLTEXT" VARCHAR2(4000), "PROGRAMID" NUMBER(8,0), "CONTROLID_TEMP" VARCHAR2(10), ) SEGMENT CREATION IMMEDIATE PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT) TABLESPACE "USERS"
复制

查询到该表竟然没有主键,在没有主键的表上做了大量UPDATE操作,于是又查询了大事务的一些相关信息,如下所示。

SQL> select addr,xidusn,xidslot,xidsqn,status from v$transaction ; 000000017FD8C318 9 14 307790 ACTIVE SQL> select sid,username,sql_id,PREV_SQL_ID,sql_hash_value from v$session where taddr='000000017FD8C318'; 381 OGG 0cd4rf8jw6a86 0t3fycvxv9xwb 599992582 SQL> select sql_fulltext from v$sql t where t.sql_id='0cd4rf8jw6a86'; UPDATE /*+ RESTRICT_ALL_REF_CONS */ "XXXX"."XXX_EMR_STAT" SET "ID" = :a8,"CONTROLID" = :a9,"CALRECORDSID" = :a10,"CONTROLVAL" = :a11,"CONTROLPARENTID" = :a12,"CONTROLTEXT" = :a13,"PROGRAMID" = :a14,"CONTROLID_TEMP" = :a15 WHERE "ID" = :b0 AND "CONTROLID" = :b1 AND "CALRECORDSID" = :b2 AND "CONTROLVAL" is NULL AND "CONTROLPARENTID" = :b4 AND "CONTROLTEXT" is NULL AND "PROGRAMID" = :b6 AND "CONTROLID_TEMP" is NULL AND ROWNUM = 1 UPDATE /*+ RESTRICT_ALL_REF_CONS */ "XXXX"."XXX_EMR_STAT" SET "ID" = :a8,"CONTROLID" = :a9,"CALRECORDSID" = :a10,"CONTROLVAL" = :a11,"CONTROLPARENTID" = :a12,"CONTROLTEXT" = :a13,"PROGRAMID" = :a14,"CONTROLID_TEMP" = :a15 WHERE "ID" = :b0 AND "CONTROLID" = :b1 AND "CALRECORDSID" = :b2 AND "CONTROLVAL" is NULL AND "CONTROLPARENTID" = :b4 AND "CONTROLTEXT" is NULL AND "PROGRAMID" = :b6 AND "CONTROLID_TEMP" is NULL AND ROWNUM = 1
复制

通过view report rta 查询RTA应用进程信息也和上面查询到的信息一致。

GGSCI (xxxxxxxx2) 1> view report rta *********************************************************************** Oracle GoldenGate Delivery for Oracle Version 11.2.1.0.33 22986205 OGGCORE_11.2.1.0.0OGGBP_PLATFORMS_160502.0200_FBO Linux, x64, 64bit (optimized), Oracle 11g on May 2 2016 06:45:35 Copyright (C) 1995, 2016, Oracle and/or its affiliates. All rights reserved. Starting at 2024-04-12 08:51:18 *********************************************************************** Operating System Version: Linux Version #1 SMP Wed Jan 9 14:40:28 EST 2019, Release 4.20.1-1.el7.elrepo.x86_64 Node: xxxxx Machine: x86_64 soft limit hard limit Address Space Size : unlimited unlimited Heap Size : unlimited unlimited File Size : unlimited unlimited CPU Time : unlimited unlimited Process id: 536 Description: *********************************************************************** ** Running with the following parameters ** *********************************************************************** 2024-04-12 08:51:18 INFO OGG-03035 Operating system character set identified as UTF-8. Locale: en_US, LC_ALL:. 2024-04-12 08:51:18 INFO OGG-02696 NON-ANSI SQL parameter syntax is used for parameter parsing. REPLICAT rta SETENV (NLS_LANG = "AMERICAN_AMERICA.ZHS16GBK") 2024-04-12 08:51:18 INFO OGG-02095 Successfully set environment variable NLS_LANG=AMERICAN_AMERICA.ZHS16GBK. USERID ogg, PASSWORD ********************************, ENCRYPTKEY default DBOPTIONS DEFERREFCONST GETTRUNCATES REPORT AT 06:00 REPORTCOUNT EVERY 30 MINUTES, RATE REPORTROLLOVER AT 02:00 REPERROR DEFAULT, ABEND ALLOWNOOPUPDATES DISCARDFILE ./dirrpt/repsa.dsc, APPEND, MEGABYTES 2000 DISCARDROLLOVER AT 02:00 ASSUMETARGETDEFS APPLYNOOPUPDATES DDLOPTIONS REPORT DDL INCLUDE ALL IGNORETRUNCATES DBOPTIONS SUPPRESSTRIGGERS REPERROR (0001, DISCARD) REPERROR (1403, DISCARD) DBOPTIONS LOBWRITESIZE 102400 BATCHSQL DDLERROR DEFAULT IGNORE RETRYOP MAXRETRIES 3 RETRYDELAY 5 MAP XXXX.*, TARGET XXXXXX.*; MAPEXCLUDE XXXX.SYS_EXPORT_SCHEMA_*; MAPEXCLUDE XXXX.SYS_EXPORT_SCHEMA_*; MAPEXCLUDE XXXX.SYS_EXPORT_SCHEMA_*; 2024-04-12 08:51:18 INFO OGG-01815 Virtual Memory Facilities for: COM anon alloc: mmap(MAP_ANON) anon free: munmap file alloc: mmap(MAP_SHARED) file free: munmap target directories: /data/ogg/dirtmp. CACHEMGR virtual memory values (may have been adjusted) CACHESIZE: 2G CACHEPAGEOUTSIZE (normal): 8M PROCESS VM AVAIL FROM OS (min): 4G CACHESIZEMAX (strict force to disk): 3.41G Database Version: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production PL/SQL Release 11.2.0.4.0 - Production CORE 11.2.0.4.0 Production TNS for Linux: Version 11.2.0.4.0 - Production NLSRTL Version 11.2.0.4.0 - Production Database Language and Character Set: NLS_LANG = "AMERICAN_AMERICA.ZHS16GBK" NLS_LANGUAGE = "AMERICAN" NLS_TERRITORY = "AMERICA" NLS_CHARACTERSET = "ZHS16GBK" *********************************************************************** ** Run Time Messages ** *********************************************************************** Opened trail file ./dirdat/ra000145 at 2024-04-12 08:51:19 Wildcard MAP resolved (entry XXXX.*): MAP "XXXX"."XXX_EMR_STAT", TARGET XXXX."XXX_EMR_STAT"; 2024-04-12 08:51:19 WARNING OGG-00869 No unique key is defined for table 'XXX_EMR_STAT'. All viable columns will be used to repr esent the key, but may not guarantee uniqueness. KEYCOLS may be used to define the key. Using following columns in default map by name: ID, CONTROLID, RECORDSID, CONTROLVAL, CONTROLPARENTID, CONTROLTEXT, PROGRAMID, CONTROLID_TEMP Using the following key columns for target table XXXX.XXX_EMR_STAT: ID, CONTROLID, RECORDSID, CONTROLVAL, CONTROLPARENTI D, CONTROLTEXT, PROGRAMID, CONTROLID_TEMP. 1096800 records processed as of 2024-04-12 09:21:19 (rate 609,delta 609) Switching to next trail file ./dirdat/ra000146 at 2024-04-12 09:22:02 due to EOF, with current RBA 1999999810 Opened trail file ./dirdat/ra000146 at 2024-04-12 09:22:02 2611200 records processed as of 2024-04-12 09:51:19 (rate 725,delta 841) 4130400 records processed as of 2024-04-12 10:21:20 (rate 764,delta 843) 2024-04-12 10:38:21 INFO OGG-01021 Command received from GGSCI: STATUS. Switching to next trail file ./dirdat/ra000147 at 2024-04-12 10:41:50 due to EOF, with current RBA 1999999858 Opened trail file ./dirdat/ra000147 at 2024-04-12 10:41:50 2024-04-12 10:42:10 INFO OGG-01021 Command received from GGSCI: STATUS. 2024-04-12 10:43:33 INFO OGG-01021 Command received from GGSCI: STATUS. 5652000 records processed as of 2024-04- 10:51:21 (rate 784,delta 844)
复制

三、问题处理

综合以上分析,建议跳过该事务,另外建议对源端表XXXX."XXX_EMR_STAT"进行改造,增加主键。

目前这套OGG同步环境,目标端设置了一些策略,源端做truncate和delete都不会在目标端同步操作,目标端数据只增不减。

对于能否跳过事务,我这里也仅给出一些建议,还是需要朋友他们进行评估。

以下附跳过事务的操作。

目标端使用logdump查看下一位置对应RBA信息。

1)使用info all查看应用进程打开的文件名

GGSCI (xxxxxxxx2) 1> info rta REPLICAT RTA Last Started 2024-04-12 08:51 Status RUNNING Checkpoint Lag 00:00:00 (updated 00:00:01 ago) Log Read Checkpoint File ./dirdat/ra000145 2024-04-15 16:37:21.875842 RBA 1440176698
复制

2)使用logdump查看下一位置对应rba信息

[oracle@xxxxxxxx2 ogg]$ logdump Oracle GoldenGate Log File Dump Utility for Oracle Version 11.2.1.0.33 22986205 OGGCORE_11.2.1.0.0OGGBP_PLATFORMS_160502.0200 Copyright (C) 1995, 2016, Oracle and/or its affiliates. All rights reserved. Logdump 5 >open /data/ogg/dirdat/ra000145 # 此处为上面info rta查看到的文件信息 Current LogTrail is /data/ogg/dirdat/ra000145 Logdump 6 >pos 1440176698 # 上面查看到的RBA 信息 Reading forward from RBA 1440176698 Logdump 7 >n # 输入n获取当前RBA位置 2024/04/11 23:04:16.002.628 FieldComp Len 93 RBA 1440176698 Name: XXXX.XXX_EMR_STAT Before Image: Partition 4 G b 0000 000a 0000 0000 0000 0116 48b2 0001 0007 0000 | ............H....... 0003 3139 3300 0200 0a00 0000 0000 0000 01b8 c700 | ..193............... 0300 0aff ff00 0000 0000 0000 0000 0400 0600 0000 | .................... 0238 3200 0500 04ff ff00 0000 0600 0a00 0000 0000 | .82................. 0000 0000 d200 0700 04ff ff00 00 | ............. Logdump 8 >n # 再次输入n获取下一RBA位置 2024/04/11 23:04:16.002.628 GGSPKUpdate Len 191 RBA 1440176924 Name: XXXX.XXX_EMR_STAT After Image: Partition 4 G m 005d 0000 000a 0000 0000 0000 0116 48b2 0001 0007 | .]............H..... 0000 0003 3139 3300 0200 0a00 0000 0000 0000 01b8 | ....193............. c700 0300 0aff ff00 0000 0000 0000 0000 0400 0600 | .................... 0000 0238 3200 0500 04ff ff00 0000 0600 0a00 0000 | ...82............... 0000 0000 0000 d200 0700 04ff ff00 0000 0000 0a00 | .................... 0000 0000 0001 1648 b200 0100 0700 0000 0331 3933 | .......H.........193 0002 000a 0000 0000 0000 0001 b8c7 0003 000a ffff | ....................
复制

3)跳过该事物

使用如下命令跳过事务

alter replicat 进程名 extseqno seqno号 extrba 第二次输入n获取到的RAB值信息 操作如下: GGSCI (xxxxxxxx2) 1> stop rta GGSCI (xxxxxxxx2) 1> alter replicat rta extseqno 1 extrba 1440176924
复制

4)重启应用进程

GGSCI (xxxxxxxx2) 1> start rta GGSCI (xxxxxxxx2) 1> info rta
复制

image.png

也欢迎关注我微信公众号

image.png

最后修改时间:2024-04-17 22:19:41
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

文章被以下合辑收录

评论

锁钥
暂无图片
0年前
评论
暂无图片 0
祸不单行(空间满外加大事务)导致OGG噶了
0年前
暂无图片 点赞
评论