暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

Oracle 需要SQL来标准化地址

ASKTOM 2021-02-25
407

问题描述

嗨,汤姆和团队,
我真的很感谢你对我们所有人的帮助。
我们正在标准化客户的地址。
我有一个包含客户数据的主t1omer表。
我有一个映射表,其中包含最终值到多个值的映射。
我需要获取与multiple_values相对应的final_value。
如果标准化后地址长度超过35个字符,
然后从字符串的右侧到末尾的上一个空格 (“”),将其创建为address2字段
您能否为sql或pl/sql提供帮助?
使用Oracle 12c数据库。


客户表:

cust_id address
10  9 Help Street, Level 4
11  22 Victoria Street
12  1495 Franklin Str
13  30 Hasivim St.,Petah-Tikva
14  2 Jakaranda St
15  61, Science Park Rd
16  61, Social park road
17  Av. Hermanos Escobar 5756
18  Ave. Hermanos Escobar 5756
19  8000 W FLORISSANT AVE
20  8600 MEMORIAL PKWY SW
21  8200 FLORISSANTMEMORIALWAYABOVE SW
22  8600 MEMORIALFLORISSANT PKWY SW

create table t1
  (
   cust_id number,
   address varchar2(100)
  );

Insert into t1 values(10,'9 Help Street, Level 4');
Insert into t1 values(11,'22 Victoria Street');
Insert into t1 values(12,'1495 Franklin Str');
Insert into t1 values(13,'61, Science Park Rd');
Insert into t1 values(14,'61, Social park road');
Insert into t1 values(15,'Av. Hermanos Escobar 5756');
Insert into t1 values(16,'Ave. Hermanos Escobar 5756');
Insert into t1 values(17,'8000 W FLORISSANT AVE');
Insert into t1 values(18,'8600 MEMORIAL PKWY SW');
Insert into t1 values(19,'8200 FLORISSANTMEMORIALWAYABOVE SW');
Insert into t1 values(20,'8600 MEMORIALFLORISSANT PKWY SW');


--------------

映射表:

id  final_value multiple_values
1   St      Street
2   St      St.
3   St      Str
4   St      St
5   Rd      Rd.
6   Rd      road
7   Av      Av.
8   Av      Ave.
9   Av      Avenue
10   Av      Aven.
11   West    W
12  South West  SW

create table t2
  (
  id number,
  final_vaue varchar2(50),
  multiple_values varchar2(50)
  );

insert into t2 values(1,'St','Street');
insert into t2 values(2,'St','St.');
insert into t2 values(3,'St','Str');
insert into t2 values(4,'St','St');
insert into t2 values(5,'Rd','Rd.');
insert into t2 values(6,'Rd','road');
insert into t2 values(7,'Av','Av.');
insert into t2 values(8,'Av','Ave.');
insert into t2 values(9,'Av','Avenue');
insert into t2 values(10,'Av','Aven.');
insert into t2 values(11,'West','W');
insert into t2 values(12,'South West','SW.');


------------

预期产出:

cust_id address
10  9 Help St, Level 4
11  22 Victoria St
12  1495 Franklin St
13  30 Hasivim St ,Petah-Tikva
14  2 Jakaranda St
15  61, Science Park Rd
16  61, Social park Rd
17  Av Hermanos Escobar 5756
18  Av Hermanos Escobar 5756
19  8000 West FLORISSANT Ave
20  8600 MEMORIAL PKWY South West


如果地址的长度超过35个字符,则预期输出为:

cust_id address                              address2
21  8200 FLORISSANTMEMORIALWAYABOVE      South West
22  8600 MEMORIALFLORISSANT PKWY         South West


感谢您的所有帮助

专家解答

这是一种解决方法:

-将地址拆分为每个单词/组件的行
-外部将映射表连接到这些行,如果有匹配,则返回最终值,如果没有匹配,则返回地址字符串
-使用match_regnize将其拆分为 <35个字符的地址字符串组https://blogs.oracle.com/datawarehousing/managing-overflows-in-listagg
-将pivot与listagg一起使用,将组转换为地址1、2、3等列:https://blogs.oracle.com/sql/how-to-convert-rows-to-columns-and-back-again-with-sql-aka-pivot-and-unpivot

这给出了类似的东西:

create table t1 (
  cust_id number,
  address varchar2(100)
);

create table t2 (
  id number,
  final_value varchar2(50),
  multiple_values varchar2(50)
);

Insert into t1 values(10,'9 Help Street, Level 4');
Insert into t1 values(11,'22 Victoria Street');
Insert into t1 values(12,'1495 Franklin Str');
Insert into t1 values(13,'61, Science Park Rd');
Insert into t1 values(14,'61, Social park road');
Insert into t1 values(15,'Av. Hermanos Escobar 5756');
Insert into t1 values(16,'Ave. Hermanos Escobar 5756');
Insert into t1 values(17,'8000 W FLORISSANT AVE');
Insert into t1 values(18,'8600 MEMORIAL PKWY SW');
Insert into t1 values(19,'8200 FLORISSANTMEMORIALWAYABOVE SW');
Insert into t1 values(20,'8600 MEMORIALFLORISSANT PKWY SW');

insert into t2 values(1,'St','Street');
insert into t2 values(2,'St','St.');
insert into t2 values(3,'St','Str');
insert into t2 values(4,'St','St');
insert into t2 values(5,'Rd','Rd.');
insert into t2 values(6,'Rd','road');
insert into t2 values(7,'Av','Av.');
insert into t2 values(8,'Av','Ave.');
insert into t2 values(9,'Av','Avenue');
insert into t2 values(10,'Av','Aven.');
insert into t2 values(11,'West','W');
insert into t2 values(12,'South West','SW.');
insert into t2 values(13,'South West','SW');

commit;

with words as (
  select cust_id,
         regexp_substr ( address, '[^ ]+', 1, l ) word,
         l
  from   t1, lateral ( 
    select level l from dual
    connect by level <= length ( address ) - 
      length ( replace ( address, ' ' ) ) + 1
  )
), replacements as (
  select cust_id, nvl ( final_value, word ) w, l
  from   words
  left   join t2
  on     word = multiple_values
), grps as (
  select * from replacements
    match_recognize (
      partition by cust_id
      order by l
      measures 
        match_number() grp
      all rows per match
      pattern ( str+ ) 
      define 
        str as sum ( length ( w || ' ' ) ) < 35
    )
)
  select * from grps
  pivot (
    listagg ( w, ' ' ) within group ( order by l )
    for grp in ( 1 as address1, 2 as address2 )
  );

CUST_ID    ADDRESS1                           ADDRESS2     
        10 9 Help Street, Level 4                     
        11 22 Victoria St                             
        12 1495 Franklin St                           
        13 61, Science Park Rd                        
        14 61, Social park Rd                         
        15 Av Hermanos Escobar 5756                   
        16 Av Hermanos Escobar 5756                   
        17 8000 West FLORISSANT AVE                   
        18 8600 MEMORIAL PKWY South West              
        19 8200 FLORISSANTMEMORIALWAYABOVE    South West    
        20 8600 MEMORIALFLORISSANT PKWY       South West


这里还有一些工作要做,以确保你正确处理标点符号,但这应该足以让你开始
文章转载自ASKTOM,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论