问题描述
嗨,汤姆和团队,
我真的很感谢你对我们所有人的帮助。
我们正在标准化客户的地址。
我有一个包含客户数据的主t1omer表。
我有一个映射表,其中包含最终值到多个值的映射。
我需要获取与multiple_values相对应的final_value。
如果标准化后地址长度超过35个字符,
然后从字符串的右侧到末尾的上一个空格 (“”),将其创建为address2字段
您能否为sql或pl/sql提供帮助?
使用Oracle 12c数据库。
客户表:
--------------
映射表:
------------
预期产出:
如果地址的长度超过35个字符,则预期输出为:
感谢您的所有帮助
我真的很感谢你对我们所有人的帮助。
我们正在标准化客户的地址。
我有一个包含客户数据的主t1omer表。
我有一个映射表,其中包含最终值到多个值的映射。
我需要获取与multiple_values相对应的final_value。
如果标准化后地址长度超过35个字符,
然后从字符串的右侧到末尾的上一个空格 (“”),将其创建为address2字段
您能否为sql或pl/sql提供帮助?
使用Oracle 12c数据库。
客户表:
cust_id address 10 9 Help Street, Level 4 11 22 Victoria Street 12 1495 Franklin Str 13 30 Hasivim St.,Petah-Tikva 14 2 Jakaranda St 15 61, Science Park Rd 16 61, Social park road 17 Av. Hermanos Escobar 5756 18 Ave. Hermanos Escobar 5756 19 8000 W FLORISSANT AVE 20 8600 MEMORIAL PKWY SW 21 8200 FLORISSANTMEMORIALWAYABOVE SW 22 8600 MEMORIALFLORISSANT PKWY SW create table t1 ( cust_id number, address varchar2(100) ); Insert into t1 values(10,'9 Help Street, Level 4'); Insert into t1 values(11,'22 Victoria Street'); Insert into t1 values(12,'1495 Franklin Str'); Insert into t1 values(13,'61, Science Park Rd'); Insert into t1 values(14,'61, Social park road'); Insert into t1 values(15,'Av. Hermanos Escobar 5756'); Insert into t1 values(16,'Ave. Hermanos Escobar 5756'); Insert into t1 values(17,'8000 W FLORISSANT AVE'); Insert into t1 values(18,'8600 MEMORIAL PKWY SW'); Insert into t1 values(19,'8200 FLORISSANTMEMORIALWAYABOVE SW'); Insert into t1 values(20,'8600 MEMORIALFLORISSANT PKWY SW');
--------------
映射表:
id final_value multiple_values 1 St Street 2 St St. 3 St Str 4 St St 5 Rd Rd. 6 Rd road 7 Av Av. 8 Av Ave. 9 Av Avenue 10 Av Aven. 11 West W 12 South West SW create table t2 ( id number, final_vaue varchar2(50), multiple_values varchar2(50) ); insert into t2 values(1,'St','Street'); insert into t2 values(2,'St','St.'); insert into t2 values(3,'St','Str'); insert into t2 values(4,'St','St'); insert into t2 values(5,'Rd','Rd.'); insert into t2 values(6,'Rd','road'); insert into t2 values(7,'Av','Av.'); insert into t2 values(8,'Av','Ave.'); insert into t2 values(9,'Av','Avenue'); insert into t2 values(10,'Av','Aven.'); insert into t2 values(11,'West','W'); insert into t2 values(12,'South West','SW.');
------------
预期产出:
cust_id address 10 9 Help St, Level 4 11 22 Victoria St 12 1495 Franklin St 13 30 Hasivim St ,Petah-Tikva 14 2 Jakaranda St 15 61, Science Park Rd 16 61, Social park Rd 17 Av Hermanos Escobar 5756 18 Av Hermanos Escobar 5756 19 8000 West FLORISSANT Ave 20 8600 MEMORIAL PKWY South West
如果地址的长度超过35个字符,则预期输出为:
cust_id address address2 21 8200 FLORISSANTMEMORIALWAYABOVE South West 22 8600 MEMORIALFLORISSANT PKWY South West
感谢您的所有帮助
专家解答
这是一种解决方法:
-将地址拆分为每个单词/组件的行
-外部将映射表连接到这些行,如果有匹配,则返回最终值,如果没有匹配,则返回地址字符串
-使用match_regnize将其拆分为 <35个字符的地址字符串组https://blogs.oracle.com/datawarehousing/managing-overflows-in-listagg
-将pivot与listagg一起使用,将组转换为地址1、2、3等列:https://blogs.oracle.com/sql/how-to-convert-rows-to-columns-and-back-again-with-sql-aka-pivot-and-unpivot
这给出了类似的东西:
这里还有一些工作要做,以确保你正确处理标点符号,但这应该足以让你开始
-将地址拆分为每个单词/组件的行
-外部将映射表连接到这些行,如果有匹配,则返回最终值,如果没有匹配,则返回地址字符串
-使用match_regnize将其拆分为 <35个字符的地址字符串组https://blogs.oracle.com/datawarehousing/managing-overflows-in-listagg
-将pivot与listagg一起使用,将组转换为地址1、2、3等列:https://blogs.oracle.com/sql/how-to-convert-rows-to-columns-and-back-again-with-sql-aka-pivot-and-unpivot
这给出了类似的东西:
create table t1 (
cust_id number,
address varchar2(100)
);
create table t2 (
id number,
final_value varchar2(50),
multiple_values varchar2(50)
);
Insert into t1 values(10,'9 Help Street, Level 4');
Insert into t1 values(11,'22 Victoria Street');
Insert into t1 values(12,'1495 Franklin Str');
Insert into t1 values(13,'61, Science Park Rd');
Insert into t1 values(14,'61, Social park road');
Insert into t1 values(15,'Av. Hermanos Escobar 5756');
Insert into t1 values(16,'Ave. Hermanos Escobar 5756');
Insert into t1 values(17,'8000 W FLORISSANT AVE');
Insert into t1 values(18,'8600 MEMORIAL PKWY SW');
Insert into t1 values(19,'8200 FLORISSANTMEMORIALWAYABOVE SW');
Insert into t1 values(20,'8600 MEMORIALFLORISSANT PKWY SW');
insert into t2 values(1,'St','Street');
insert into t2 values(2,'St','St.');
insert into t2 values(3,'St','Str');
insert into t2 values(4,'St','St');
insert into t2 values(5,'Rd','Rd.');
insert into t2 values(6,'Rd','road');
insert into t2 values(7,'Av','Av.');
insert into t2 values(8,'Av','Ave.');
insert into t2 values(9,'Av','Avenue');
insert into t2 values(10,'Av','Aven.');
insert into t2 values(11,'West','W');
insert into t2 values(12,'South West','SW.');
insert into t2 values(13,'South West','SW');
commit;
with words as (
select cust_id,
regexp_substr ( address, '[^ ]+', 1, l ) word,
l
from t1, lateral (
select level l from dual
connect by level <= length ( address ) -
length ( replace ( address, ' ' ) ) + 1
)
), replacements as (
select cust_id, nvl ( final_value, word ) w, l
from words
left join t2
on word = multiple_values
), grps as (
select * from replacements
match_recognize (
partition by cust_id
order by l
measures
match_number() grp
all rows per match
pattern ( str+ )
define
str as sum ( length ( w || ' ' ) ) < 35
)
)
select * from grps
pivot (
listagg ( w, ' ' ) within group ( order by l )
for grp in ( 1 as address1, 2 as address2 )
);
CUST_ID ADDRESS1 ADDRESS2
10 9 Help Street, Level 4
11 22 Victoria St
12 1495 Franklin St
13 61, Science Park Rd
14 61, Social park Rd
15 Av Hermanos Escobar 5756
16 Av Hermanos Escobar 5756
17 8000 West FLORISSANT AVE
18 8600 MEMORIAL PKWY South West
19 8200 FLORISSANTMEMORIALWAYABOVE South West
20 8600 MEMORIALFLORISSANT PKWY South West 这里还有一些工作要做,以确保你正确处理标点符号,但这应该足以让你开始
文章转载自ASKTOM,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。




