问题描述
嗨,汤姆,
我有一个非常大的表,有超过8.5亿行的数据。我们正在使用CDC将数据从源系统提取到发布目标,并将etl提取到数据仓库和ODS。我需要运行定期检查,以确保两个系统之间的数据同步,如果不同步,则使它们同步。
由于数据规模庞大,我认为DBMS_COMPARISON.CREATE_COMPARISON将是一个很好的候选人。但是,当我尝试运行它时,出现错误: ora-23626: 'schema.indexname' 不合格索引错误。通过文档,我发现表的主键是由7列组成的组合,具有数字和nchar数据类型的组合。结果,它没有通过DBMS_COMPARISON的要求。
你还能推荐什么?减号查询是我唯一的选择吗?
非常感谢你的帮助。
邦妮
我有一个非常大的表,有超过8.5亿行的数据。我们正在使用CDC将数据从源系统提取到发布目标,并将etl提取到数据仓库和ODS。我需要运行定期检查,以确保两个系统之间的数据同步,如果不同步,则使它们同步。
由于数据规模庞大,我认为DBMS_COMPARISON.CREATE_COMPARISON将是一个很好的候选人。但是,当我尝试运行它时,出现错误: ora-23626: 'schema.indexname' 不合格索引错误。通过文档,我发现表的主键是由7列组成的组合,具有数字和nchar数据类型的组合。结果,它没有通过DBMS_COMPARISON的要求。
你还能推荐什么?减号查询是我唯一的选择吗?
非常感谢你的帮助。
邦妮
专家解答
这里有一些你可以探索的选择
所以我们从传统的减号开始
问题是-我们能做得更好吗?数据库可以是最好的事情之一是哈希连接。因此 (以更多的SQL代码为代价),我们可以使用完整的外部连接语法进行类似的比较。在下面的示例中,我们假设表的逻辑主键是 “object_id”。
现在,这是更快的方式,并且 “工作”,因为它发现缺少 * 行 *,但它没有找到在源和目标中都存在键但数据不同的行。如果这是一部戏剧,那么我们可以使用非键列来增强查询,并使用一些解码进行比较。
所以我们可以实现要求,仍然使用联接。
需要注意的另一件事是,真正的 “源” 与 “目标” 比较的一大成本是,如果它们在不同的数据库上,那么我们必须在网络上拖动这些表中的一个。在这种情况下,您可能需要考虑一个潜在的折衷方案,即我们自己进行一些散列,因此我们仅将hashkey拖到电线上,例如
最后,有时 “分层” 方法也许是进行 “足够好” 检查的一种手段。例如,您可能会做一些由所有者雕刻的事情,例如
这意味着网络上的数据最少,但是如果您 * 确实 * 发现差异,则需要进一步的工作。
希望这有所帮助。
SQL> create table t1 as select * from dba_objects where object_id is not null; Table created. SQL> create table t2 as select * from t1; Table created. -- -- Now I'll 'manipulate' t2 so its slightly different from t1 -- SQL> delete from t2 where owner = 'SCOTT' and object_name = 'EMP'; 1 row deleted. SQL> update t2 set owner = 'SCOTT2' where object_name = 'DEPT' and owner = 'SCOTT'; 1 row updated.复制
所以我们从传统的减号开始
SQL> set autotrace on SQL> select * 2 from 3 ( 4 select * from t1 5 minus 6 select * from t2 7 union all 8 select * from t2 9 minus 10 select * from t1 11 ); OWNER OBJECT_NAME ------------------------------ ---------------------------------------- SUBOBJECT_NAME ---------------------------------------------------------------------------------------------------- OBJECT_ID DATA_OBJECT_ID OBJECT_TYPE CREATED LAST_DDL_ TIMESTAMP STATUS T ---------- -------------- ----------------------- --------- --------- ------------------- ------- - G S NAMESPACE - - ---------- EDITION_NAME ---------------------------------------------------------------------------------------------------- SHARING E O ------------- - - SCOTT2 DEPT 92609 170186 TABLE 20-NOV-15 02-MAR-17 2015-11-20:15:05:06 VALID N N N 1 NONE N 1 row selected. Execution Plan ---------------------------------------------------------- Plan hash value: 633555309 ---------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 208K| 73M| | 12003 (1)| 00:00:01 | | 1 | VIEW | | 208K| 73M| | 12003 (1)| 00:00:01 | | 2 | MINUS | | | | | | | | 3 | SORT UNIQUE | | 208K| 45M| 62M| 12003 (1)| 00:00:01 | | 4 | UNION-ALL | | | | | | | | 5 | MINUS | | | | | | | | 6 | SORT UNIQUE | | 104K| 11M| 15M| 3001 (1)| 00:00:01 | | 7 | TABLE ACCESS FULL| T1 | 104K| 11M| | 313 (1)| 00:00:01 | | 8 | SORT UNIQUE | | 104K| 11M| 15M| 3001 (1)| 00:00:01 | | 9 | TABLE ACCESS FULL| T2 | 104K| 11M| | 313 (1)| 00:00:01 | | 10 | TABLE ACCESS FULL | T2 | 104K| 11M| | 313 (1)| 00:00:01 | | 11 | SORT UNIQUE | | 104K| 11M| 15M| 3001 (1)| 00:00:01 | | 12 | TABLE ACCESS FULL | T1 | 104K| 11M| | 313 (1)| 00:00:01 | ---------------------------------------------------------------------------------------- Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 7020 consistent gets 0 physical reads 0 redo size 1856 bytes sent via SQL*Net to client 552 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 4 sorts (memory) 0 sorts (disk) 1 rows processed复制
问题是-我们能做得更好吗?数据库可以是最好的事情之一是哈希连接。因此 (以更多的SQL代码为代价),我们可以使用完整的外部连接语法进行类似的比较。在下面的示例中,我们假设表的逻辑主键是 “object_id”。
SQL> select * 2 from ( 3 select t1.object_id t1r, t2.object_id t2r 4 from t1 full outer join t2 5 on t1.object_id = t2.object_id 6 ) 7 where t1r is null or t2r is null; T1R T2R ---------- ---------- 92608 1 row selected. Execution Plan ---------------------------------------------------------- Plan hash value: 53297166 ------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | ------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 104K| 2653K| | 793 (1)| 00:00:01 | |* 1 | VIEW | VW_FOJ_0 | 104K| 2653K| | 793 (1)| 00:00:01 | |* 2 | HASH JOIN FULL OUTER| | 104K| 1020K| 1736K| 793 (1)| 00:00:01 | | 3 | TABLE ACCESS FULL | T1 | 104K| 510K| | 312 (1)| 00:00:01 | | 4 | TABLE ACCESS FULL | T2 | 104K| 510K| | 312 (1)| 00:00:01 | ------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("T1"."OBJECT_ID" IS NULL OR "T2"."OBJECT_ID" IS NULL) 2 - access("T1"."OBJECT_ID"="T2"."OBJECT_ID") Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 3510 consistent gets 0 physical reads 0 redo size 605 bytes sent via SQL*Net to client 552 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed复制
现在,这是更快的方式,并且 “工作”,因为它发现缺少 * 行 *,但它没有找到在源和目标中都存在键但数据不同的行。如果这是一部戏剧,那么我们可以使用非键列来增强查询,并使用一些解码进行比较。
SQL> SQL> select * 2 from ( 3 select t1.object_id t1r, t2.object_id t2r, t1.owner t1o, t2.owner t2o 4 from t1 full outer join t2 5 on t1.object_id = t2.object_id 6 ) 7 where 8 t1r is null or 9 t2r is null or 10 decode(t1o,t2o,1,0)=0; T1R T2R ---------- ---------- T1O ---------------------------------------------------------------------------------------------------- T2O ---------------------------------------------------------------------------------------------------- 92609 92609 SCOTT SCOTT2 92608 SCOTT 2 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 53297166 ------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | ------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 104K| 15M| | 863 (1)| 00:00:01 | |* 1 | VIEW | VW_FOJ_0 | 104K| 15M| | 863 (1)| 00:00:01 | |* 2 | HASH JOIN FULL OUTER| | 104K| 2449K| 2456K| 863 (1)| 00:00:01 | | 3 | TABLE ACCESS FULL | T1 | 104K| 1224K| | 312 (1)| 00:00:01 | | 4 | TABLE ACCESS FULL | T2 | 104K| 1224K| | 312 (1)| 00:00:01 | ------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("T1"."OBJECT_ID" IS NULL OR "T2"."OBJECT_ID" IS NULL OR DECODE("T1"."OWNER","T2"."OWNER",1,0)=0) 2 - access("T1"."OBJECT_ID"="T2"."OBJECT_ID") Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 3511 consistent gets 0 physical reads 0 redo size 816 bytes sent via SQL*Net to client 552 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 2 rows processed复制
所以我们可以实现要求,仍然使用联接。
需要注意的另一件事是,真正的 “源” 与 “目标” 比较的一大成本是,如果它们在不同的数据库上,那么我们必须在网络上拖动这些表中的一个。在这种情况下,您可能需要考虑一个潜在的折衷方案,即我们自己进行一些散列,因此我们仅将hashkey拖到电线上,例如
SQL> SQL> select * from 2 ( select rowid t1rid, ora_hash(t1.owner||t1.object_name||t1.object_id) t1colhash from t1) t1 3 full outer join 4 ( select rowid t2rid, ora_hash(t2.owner||t2.object_name||t2.object_id) t2colhash from t2) t2 5 on t1.t1colhash = t2.t2colhash 6 where t1colhash is null 7 or t2colhash is null; T1RID T1COLHASH T2RID T2COLHASH ------------------ ---------- ------------------ ---------- AAAyZkAAGAAAXOeAA0 2029513192 AAAyZjAAGAAAOCeAA0 1656564078 AAAyZjAAGAAAOCeAAz 583804402 3 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 1042339821 ------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | ------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 109M| 5216M| | 1275 (23)| 00:00:01 | |* 1 | VIEW | VW_FOJ_0 | 109M| 5216M| | 1275 (23)| 00:00:01 | |* 2 | HASH JOIN FULL OUTER| | 109M| 5216M| 3776K| 1275 (23)| 00:00:01 | | 3 | VIEW | | 104K| 2551K| | 312 (1)| 00:00:01 | | 4 | TABLE ACCESS FULL | T1 | 104K| 5000K| | 312 (1)| 00:00:01 | | 5 | VIEW | | 104K| 2551K| | 312 (1)| 00:00:01 | | 6 | TABLE ACCESS FULL | T2 | 104K| 5000K| | 312 (1)| 00:00:01 | ------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("T1COLHASH" IS NULL OR "T2COLHASH" IS NULL) 2 - access("T1"."T1COLHASH"="T2"."T2COLHASH") Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 3511 consistent gets 0 physical reads 0 redo size 877 bytes sent via SQL*Net to client 552 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 3 rows processed SQL> SQL> SQL>复制
最后,有时 “分层” 方法也许是进行 “足够好” 检查的一种手段。例如,您可能会做一些由所有者雕刻的事情,例如
SQL> select * from 2 ( select owner t1own, sum(ora_hash(owner||object_name||object_id)) t1colhash , count(*) t1cnt from t1 group by owner) 3 full outer join 4 ( select owner t2own, sum(ora_hash(owner||object_name||object_id)) t2colhash , count(*) t2cnt from t2 group by owner) 5 on t1.t1own = t2.t2own 6 where t1own is null 7 or t1own is null 8 or t1colhash != t2colhash 9 or t1cnt != t2cnt; T1OWN ---------------------------------------------------------------------------------------------------- T1COLHASH T1CNT ---------- ---------- T2OWN ---------------------------------------------------------------------------------------------------- T2COLHASH T2CNT ---------- ---------- SCOTT2 2029513192 1 SCOTT 4.6253E+10 21 SCOTT 4.4013E+10 19 2 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 632540738 ---------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 42 | 7728 | 630 (2)| 00:00:01 | |* 1 | VIEW | VW_FOJ_0 | 42 | 7728 | 630 (2)| 00:00:01 | |* 2 | HASH JOIN FULL OUTER| | 42 | 7728 | 630 (2)| 00:00:01 | | 3 | VIEW | | 42 | 3864 | 315 (2)| 00:00:01 | | 4 | HASH GROUP BY | | 42 | 1554 | 315 (2)| 00:00:01 | | 5 | TABLE ACCESS FULL| T1 | 104K| 3775K| 312 (1)| 00:00:01 | | 6 | VIEW | | 42 | 3864 | 315 (2)| 00:00:01 | | 7 | HASH GROUP BY | | 42 | 1554 | 315 (2)| 00:00:01 | | 8 | TABLE ACCESS FULL| T2 | 104K| 3775K| 312 (1)| 00:00:01 | ---------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("T1OWN" IS NULL OR "T1COLHASH"<>"T2COLHASH" OR "T1CNT"<>"T2CNT") 2 - access("T1"."T1OWN"="T2"."T2OWN") Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 3510 consistent gets 0 physical reads 0 redo size 984 bytes sent via SQL*Net to client 552 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 2 rows processed复制
这意味着网络上的数据最少,但是如果您 * 确实 * 发现差异,则需要进一步的工作。
希望这有所帮助。
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。
评论
相关阅读
Oracle RAC 一键安装翻车?手把手教你如何排错!
Lucifer三思而后行
593次阅读
2025-04-15 17:24:06
【纯干货】Oracle 19C RU 19.27 发布,如何快速升级和安装?
Lucifer三思而后行
560次阅读
2025-04-18 14:18:38
XTTS跨版本迁移升级方案(11g to 19c RAC for Linux)
zwtian
482次阅读
2025-04-08 09:12:48
Oracle数据库一键巡检并生成HTML结果,免费脚本速来下载!
陈举超
472次阅读
2025-04-20 10:07:02
【ORACLE】记录一些ORACLE的merge into语句的BUG
DarkAthena
457次阅读
2025-04-22 00:20:37
【ORACLE】你以为的真的是你以为的么?--ORA-38104: Columns referenced in the ON Clause cannot be updated
DarkAthena
430次阅读
2025-04-22 00:13:51
Oracle 19c RAC更换IP实战,运维必看!
szrsu
430次阅读
2025-04-08 23:57:08
【活动】分享你的压箱底干货文档,三篇解锁进阶奖励!
墨天轮编辑部
414次阅读
2025-04-17 17:02:24
火焰图--分析复杂SQL执行计划的利器
听见风的声音
358次阅读
2025-04-17 09:30:30
3月“墨力原创作者计划”获奖名单公布
墨天轮编辑部
356次阅读
2025-04-15 14:48:05