问题描述
Hello I working on the doing an application to customer match , where an application can have multiple customers (corporate) and customer can have multiple applications. so i am doing a cartesian join and using UTL_MATCH .JARO_WINKLER_SMILIARTY for string matching. Table A (applications) Approximatly 20,000 per day Table B (CusomterPfile) approximatly 400,000 per month. I have to match apps for month, so intermeidate volume is 20,000*400,000 = 8,000,000,000 and final data set with JW_SCORE>970 Its time consuming to create the table of 8billion and fillter for200K Approx having jw_score>970. And also to filter in sub querry. please suggest a better approach . CREATE TABLE tablea ( APP_ID NUMBER(14), APP_LOAD_DATE DATE, SSN VARCHAR2(20 BYTE), DOB DATE, FIRST_NAME VARCHAR2(50 BYTE), LAST_NAME VARCHAR2(50 BYTE), ADDR VARCHAR2(92 BYTE), CITY VARCHAR2(50 BYTE), STATE VARCHAR2(20 BYTE), ZIP VARCHAR2(20 BYTE) ); CREATE TABLE tableb ( ACCOUNT_NUMBER NUMBER(9) NOT NULL, DOB DATE, SSN NUMBER(9), FIRST_NAME VARCHAR2(32 BYTE), LAST_NAME VARCHAR2(60 BYTE), ADDR VARCHAR2(100 BYTE), CITY VARCHAR2(26 BYTE), STATE CHAR(2 BYTE), ZIP CHAR(5 BYTE), ZIP_4 CHAR(4 BYTE) ); SELECT A.* , JW_SSN+JW_DOB+JW_FNAME+JW_LNAME+JW_ADDR AS JW_SCORE FROM ( SELECT /*+ PARALLEL(16) */ a.appid , a.ssn a_ssn , b.ssn b_ssn , a.dob AS a_dob , b.dob AS b_dob , a.first_name AS a_first_name , b.first_name AS b_first_name , a.last_name AS a_last_name , b.last_name AS b_last_name , A.FIRST_NAME ||' '|| A.LAST_NAME AS A_FULLNAME , B.FIRST_NAME ||' ' ||B.LAST_NAME AS B_FULLNAME , a.addr AS a_addr , b.addr b_addr , a.city AS a_city , b.city b_city , a.state AS a_state , b.state b_state , a.zip AS a_zip , b.zip b_zip , ( CASE WHEN ( A.ssn IS NULL or B.SSN IS NULL ) THEN 100 WHEN ( ( A.SSN = 111111111 OR B.SSN = 111111111)) THEN 100 ELSE UTL_MATCH.jaro_winkler_similarity ( LPAD( A.ssn,9,0) , LPAD( b.ssn ,9,0)) END)* 3 as JW_SSN ,( CASE WHEN ( A.dob IS NULL OR B.dob IS NULL ) THEN 100 ELSE UTL_MATCH.jaro_winkler_similarity ( A.dob ,B.dob) END )*2 as JW_DOB ,( CASE WHEN ( A.first_name IS NULL OR B.first_name IS NULL ) THEN 100 ELSE UTL_MATCH.jaro_winkler_similarity ( A.first_name ,B.first_name) END ) *2as JW_FNAME , (CASE WHEN ( A.last_name IS NULL OR B.last_name IS NULL ) THEN 100 ELSE UTL_MATCH.jaro_winkler_similarity ( A.last_name ,B.last_name) END) *2 as JW_LNAME , CASE WHEN ( A.FIRST_NAME || A.LAST_NAME IS OR AND B.FIRST_NAME||B.LAST_NAME IS NULL ) THEN 100 ELSE UTL_MATCH.jaro_winkler_similarity ( A.FIRST_NAME ||' '|| A.LAST_NAME , B.FIRST_NAME ||' ' ||B.LAST_NAME) END as JW_FULLNAME , ( CASE WHEN ( A.addr IS NULL AND B.addr IS NULL ) THEN 100 ELSE UTL_MATCH.jaro_winkler_similarity ( A.addr ,B.addr) END) * 1 as JW_ADDR FROM tablea a, tableb b WHERE 1 = 1 AND b.start_service_date >= a.app_load_date ) A WHERE JW_SSN+JW_DOB+JW_FNAME+JW_LNAME+JW_ADDR > 970; Plan hash value: 1977758127 ---------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib | ---------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 733M| 242G| | 5713 (98)| 00:00:12 | | | | | 1 | PX COORDINATOR FORCED SERIAL| | | | | | | | | | | 2 | PX SEND QC (RANDOM) | :TQ10001 | 733M| 242G| | 5713 (98)| 00:00:12 | Q1,01 | P->S | QC (RAND) | | 3 | MERGE JOIN | | 733M| 242G| | 5713 (98)| 00:00:12 | Q1,01 | PCWP | | | 4 | SORT JOIN | | 479K| 72M| 170M| 74 (5)| 00:00:01 | Q1,01 | PCWP | | | 5 | PX RECEIVE | | 479K| 72M| | 71 (0)| 00:00:01 | Q1,01 | PCWP | | | 6 | PX SEND BROADCAST | :TQ10000 | 479K| 72M| | 71 (0)| 00:00:01 | Q1,00 | P->P | BROADCAST | | 7 | PX BLOCK ITERATOR | | 479K| 72M| | 71 (0)| 00:00:01 | Q1,00 | PCWC | | | 8 | TABLE ACCESS FULL | JW_BAN_TABLE | 479K| 72M| | 71 (0)| 00:00:01 | Q1,00 | PCWP | | |* 9 | FILTER | | | | | | | Q1,01 | PCWP | | |* 10 | SORT JOIN | | 611K| 114M| 258M| 94 (5)| 00:00:01 | Q1,01 | PCWP | | | 11 | PX BLOCK ITERATOR | | 611K| 114M| | 91 (2)| 00:00:01 | Q1,01 | PCWC | | | 12 | TABLE ACCESS FULL | JW_APP | 611K| 114M| | 91 (2)| 00:00:01 | Q1,01 | PCWP | | ---------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 9 - filter(CASE WHEN (("A"."SSN" IS NULL) OR ("B"."SSN" IS NULL)) THEN 100 WHEN ((TO_NUMBER("A"."SSN")=111111111) OR ("B"."SSN"=111111111)) THEN 100 ELSE "UTL_MATCH"."JARO_WINKLER_SIMILARITY"(LPAD("A"."SSN",9,'0'),LPAD(TO_CHAR("B"."SSN"),9 ,'0')) END *3+CASE WHEN (("A"."DOB" IS NULL) OR ("B"."DOB" IS NULL)) THEN 100 ELSE "UTL_MATCH"."JARO_WINKLER_SIMILARITY"(INTERNAL_FUNCTION("A"."DOB"),INTERNAL_FUNCTION("B"."DOB")) END *2+CASE WHEN (("A"."FIRST_NAME" IS NULL) OR ("B"."FIRST_NAME" IS NULL)) THEN 100 ELSE "UTL_MATCH"."JARO_WINKLER_SIMILARITY"("A"."FIRST_NAME","B"."FIRST_NAME") END *2+CASE WHEN (("A"."LAST_NAME" IS NULL) OR ("B"."LAST_NAME" IS NULL)) THEN 100 ELSE "UTL_MATCH"."JARO_WINKLER_SIMILARITY"("A"."LAST_NAME","B"."LAST_NAME") END *2+CASE WHEN (("A"."ADDR" IS NULL) AND ("B"."ADDR" IS NULL)) THEN 100 ELSE "UTL_MATCH"."JARO_WINKLER_SIMILARITY"("A"."ADDR","B"."ADDR") END *1>970) 10 - access(INTERNAL_FUNCTION("B"."START_SERVICE_DATE")>=INTERNAL_FUNCTION("A"."APP_LOAD_DATE")) filter(INTERNAL_FUNCTION("B"."START_SERVICE_DATE")>=INTERNAL_FUNCTION("A"."APP_LOAD_DATE")) Note ----- - Degree of Parallelism is 16 because of hint复制
提前感谢
专家解答
您需要 “更多” 才能加入,以通过加入更好地减少行。
因此,您可以提供一些您可能正在通过数据库进行的假设。例如,您可能会假设我们将匹配每个名称的首字母以及它们所处的状态。
和b.开始服务日期> = a.应用程序加载日期
和substr(b.first_name,1,1) = substr(a.first_name,1,1)
和substr(b.last_name,1,1) = substr(a.last_name,1,1)
和b.状态 = a.状态
因此,您可以提供一些您可能正在通过数据库进行的假设。例如,您可能会假设我们将匹配每个名称的首字母以及它们所处的状态。
和b.开始服务日期> = a.应用程序加载日期
和substr(b.first_name,1,1) = substr(a.first_name,1,1)
和substr(b.last_name,1,1) = substr(a.last_name,1,1)
和b.状态 = a.状态
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。
评论
相关阅读
Oracle RAC 一键安装翻车?手把手教你如何排错!
Lucifer三思而后行
595次阅读
2025-04-15 17:24:06
【纯干货】Oracle 19C RU 19.27 发布,如何快速升级和安装?
Lucifer三思而后行
563次阅读
2025-04-18 14:18:38
XTTS跨版本迁移升级方案(11g to 19c RAC for Linux)
zwtian
482次阅读
2025-04-08 09:12:48
Oracle数据库一键巡检并生成HTML结果,免费脚本速来下载!
陈举超
473次阅读
2025-04-20 10:07:02
【ORACLE】记录一些ORACLE的merge into语句的BUG
DarkAthena
457次阅读
2025-04-22 00:20:37
【ORACLE】你以为的真的是你以为的么?--ORA-38104: Columns referenced in the ON Clause cannot be updated
DarkAthena
431次阅读
2025-04-22 00:13:51
Oracle 19c RAC更换IP实战,运维必看!
szrsu
430次阅读
2025-04-08 23:57:08
【活动】分享你的压箱底干货文档,三篇解锁进阶奖励!
墨天轮编辑部
415次阅读
2025-04-17 17:02:24
火焰图--分析复杂SQL执行计划的利器
听见风的声音
359次阅读
2025-04-17 09:30:30
3月“墨力原创作者计划”获奖名单公布
墨天轮编辑部
357次阅读
2025-04-15 14:48:05