暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

修改数据库主键长度导致GOLDENGATE复制进程无法启动提示OGG-01296&ABEND on error 0

DB说 2019-01-14
988

【背景】

      前几天有个朋友发消息说,GOLDENATE复制进程无法启动,错误日志如下:

错误日志:

2019-01-11 17:56:34  ERROR   OGG-01296  Error mapping from SOURCE.AA  to TARGET.AA

DISCARD日志:

Mapping error to target column:  DID

Mapping error to target column:  DID

Current time: 2019-01-11 17:34:31

Discarded record from action ABEND on error 0 -- 提示ABEND on error 0

 

 

Aborting transaction on ./dirdat/lq beginning at seqno 15172 rba 475244077

                         error at seqno 15172 rba 475427432

Problem replicating SOURCE.AA  toTARGET.AA

Mapping problem with insert record (source format)...

*

DID = <Raw Data>         ------------------> 这个主键是数字,没有显示正确,居然是<Raw Data>

000000: d5 11               

DCODE = 0001

ORTYPE = <Raw Data>

000000: c5 e4 bc fe c6 d5 cd a8 b6 a9 b5 a5             |............    |

*

Process Abending : 2019-01-11 17:34:31

 

【环境介绍】

      os: windows 2008 R2

      db:11.2.0.4

      ogg:11.2.1.0.17

      注释:windows环境确实有点不习惯,包括debug以及之类.更坑的是报错信息居然和linux下不一样的.

      表结构:

     CREATE TABLE AA(

     DID                                                NUMBER(20) NOT NULL  primary key,

     DCCODE                                         VARCHAR2(50),

      OTYPE                                             VARCHAR2(50),

     DIST                                                NUMBER(18,2),

      CEDATE                                          DATE,

     DDLINE                                           DATE,

      CODE                                              VARCHAR2(50))

 

【分析过程】

   1、使用logdump分析数据 --发现主键的16进制6061123939

但是discard具体没有显示,只是显示<Raw Data>000000: d5 11 ,通过d5 11通过各种几进制转换,也没有对不上.



2、对比2边表结构发现一致的.

12.2开始表定义存在trailfile header中,可以通过SCANFORMETADATA命令可以获取,如下:

logdump> SCANFORMETADATA

12.2以下版本可以通过ggsci

capture tabledefs xx.xx来获取


 

3、使用goldengate sql debug模式来打印执行语句:

注释:诡异,debug模式进程直接推出,什么都没有打印出来,正常不管什么sql都可以打印出来。

debug模式日志:

 

2019-01-11 17:56:34  WARNING OGG-01431  Aborted grouped transaction on 'SOURCE.AA', Mapping error.

 

2019-01-11 17:56:34  WARNING OGG-01003  Repositioning to rba 475244077 in seqno 15172.

 

2019-01-11 17:56:34  WARNING OGG-01151  Error mapping from SOURCE.AA to SOURCE.AA.

 

2019-01-11 17:56:34  WARNING OGG-01003  Repositioning to rba 475244077 in seqno 15172.

 

Source Context :

  SourceModule            : [er.errors]

  SourceID                : [er/errors.cpp]

  SourceFunction          : [take_rep_err_action]

  SourceLine              : [632]

  ThreadBacktrace         : [11] elements

                          : [D:\dmsjk\gglog.dll(??1CContextItem@@UEAA@XZ+0x34f3) [0x0000000180114DE3]]

                          : [D:\dmsjk\gglog.dll(?_MSG_ERR_MAP_TO_TANDEM_FAILED@@YAPEAVCMessage@@PEAVCSourceContext@@AEBV?$CQualDBObjName@$00@ggapp@gglib@ggs@@1W4MessageDisposition@CMessageFactory@@@Z+0x138) [0x00000001800AE3D8]]

                          : [D:\dmsjk\replicat.exe(ERCALLBACK+0x72ae) [0x000000014009CA3E]]

                          : [D:\dmsjk\replicat.exe(ERCALLBACK+0x36577) [0x00000001400CBD07]]

                          : [D:\dmsjk\replicat.exe(ERCALLBACK+0x5c98a) [0x00000001400F211A]]

                          : [D:\dmsjk\replicat.exe(_ggTryDebugHook+0x13a34) [0x00000001401D05F4]]

                          : [D:\dmsjk\replicat.exe(_ggTryDebugHook+0x12ad3) [0x00000001401CF693]]

                          : [D:\dmsjk\replicat.exe(ERCALLBACK+0x5ce90) [0x00000001400F2620]]

                          : [D:\dmsjk\replicat.exe(CommonLexerNewSSD+0xcf20) [0x00000001402B2F70]]

                          : [C:\Windows\system32\kernel32.dll(BaseThreadInitThunk+0xd) [0x000000007760F56D]]

                          : [C:\Windows\SYSTEM32\ntdll.dll(RtlUserThreadStart+0x21) [0x0000000077743281]]

 

2019-01-11 17:56:34  ERROR   OGG-01296  Error mapping from SOURCE.AA to SOURCE.AA.

 

***********************************************************************

*                   ** Run Time Statistics **                         *

***********************************************************************

 

Reading ./dirdat/lq015172, current RBA 475427432, 0 records

 

Report at 2019-01-11 17:56:34 (activity since 2019-01-11 17:56:32)

 

From Table SOURCE.AA  toSOURCE.AA:

       #                   inserts:            0

       #                   updates:          0

       #                   deletes:            0

       #                  discards:            1

 

4、分析logdump中第一列长度是10,但是表结构中显示20

经过了解,源端ogg配置ddl,直接修改数据库主键长度10变成20,目标端由于特定原因导致进程异常,ddl没有正常同步,运维人员手动执行ddl操作,修改ddl后,过一段时间,ogg运维人员发现复制进程宕机了,为了验证此过程,新建一张一样的表且主键长度为10(因为表中有数据,无法将字段长度减小)

再次开启debug模式,可以打印SQL


 

 

5、为了模拟这个场景,在linux上和windowsdiscard错误差异很多.

linux下discard文件:

Current time: 2019-01-14 17:00:19

Discarded record from action ABEND on error 0

 

Aborting transaction on ./dirdat/lq beginning at seqno 15172 rba 475427432

                         error at seqno 15172 rba 475427432

Problem replicating DMS.T_SH_DISCOUNT to TARGET.T_SH_DISCOUNT

Mapping problem with insert record (source format)...

*

DID = --->这个为空

DCODE = 5582

OTYPE = <Raw Data>

000000: c5 e4 bc fe c6 d5 cd a8 b6 a9 b5 a5             |............    |

 

Windowsdiscard文件:

Aborting transaction on ./dirdat/lq beginning at seqno 15172 rba 475244077

                         error at seqno 15172 rba 475427432

Problem replicating SOURCE.AA  toTARGET.AA

Mapping problem with insert record (source format)...

*

DID = <Raw Data>         ------------------> 这个主键是数字,没有显示正确,居然是<Raw Data>

000000: d5 11               

DCODE = 0001

ORTYPE = <Raw Data>

000000: c5 e4 bc fe c6 d5 cd a8 b6 a9 b5 a5             |............    |

*

Process Abending : 2019-01-11 17:34:31

 

【总结】

  1对于修改表结构的,目标端必须将之前延迟数据应用完成后,再修改DDL

  2、对于配置ddl同步的,源端和目标端会自动同步,无需人工干预.

  3、多分析report日志以及discard文件找出蛛丝马迹.

 


最后修改时间:2020-11-25 18:10:05
文章转载自DB说,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论