暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

hive与postgresql 之爆炸函数

大数据启示录 2022-02-15
1075

1:postgresql炸裂详解

炸裂函数regexp_split_to_table
select regexp_split_to_table('飞机,火车,地铁,汽车', ',' ) 以逗号切分,转为数据集
select regexp_split_to_array('飞机,火车,地铁,汽车', ',' ) 转为数组


select (regexp_split_to_array('飞机,火车,地铁,汽车', ',' ))[1] 取数组的第二个元素
select regexp_split_to_table('F:\QH本部文件\一套表部署相关\test.sh','\\') 正则匹配


array_agg(expression) 把表达式变成一个数组 一般配合 array_to_string() 函数使用
select nameid, array_agg(traffic ) from dbscheme.test0001 group by nameid order by nameid ; 变为数组


string_agg(expression, delimiter) 直接把一个表达式变成字符串
select nameid, string_agg(traffic,',') , update_time from dbscheme.test0001
group by nameid,update_time order by nameid,update_time; //相同id 的连接到一起,逗号分隔


select nameid, array_to_string(array_agg(traffic),',') from dbscheme.test0001 group by nameid order by nameid ; .数组转字符串

2:hive炸裂详解

+----------+----------------------+--+
| a.id | a.tim |
+----------+----------------------+--+
| a,b,c,d | 2:00,3:00,4:00,5:00 |
| f,b,c,d | 1:10,2:20,3:30,4:40 |
+----------+----------------------+--+

explode 炸裂函数,一列变多行。

select id,tim,single_tim 
from atlasdemo.a
lateral view explode(split(tim,',')) t as single_tim
+----------+----------------------+-------------+--+
| id | tim | single_tim |
+----------+----------------------+-------------+--+
| a,b,c,d | 2:00,3:00,4:00,5:00 | 2:00 |
| a,b,c,d | 2:00,3:00,4:00,5:00 | 3:00 |
| a,b,c,d | 2:00,3:00,4:00,5:00 | 4:00 |
| a,b,c,d | 2:00,3:00,4:00,5:00 | 5:00 |
| f,b,c,d | 1:10,2:20,3:30,4:40 | 1:10 |
| f,b,c,d | 1:10,2:20,3:30,4:40 | 2:20 |
| f,b,c,d | 1:10,2:20,3:30,4:40 | 3:30 |
| f,b,c,d | 1:10,2:20,3:30,4:40 | 4:40 |
+----------+----------------------+-------------+--+

posexplode炸裂除了会炸开数组/map,还会对应生成索引下标。

select id,tim,single_id_index,single_id 
from atlasdemo.a
lateral view posexplode(split(id,',')) t as single_id_index, single_id;
+----------+----------------------+------------------+------------+--+
| id | tim | single_id_index | single_id |
+----------+----------------------+------------------+------------+--+
| a,b,c,d | 2:00,3:00,4:00,5:00 | 0 | a |
| a,b,c,d | 2:00,3:00,4:00,5:00 | 1 | b |
| a,b,c,d | 2:00,3:00,4:00,5:00 | 2 | c |
| a,b,c,d | 2:00,3:00,4:00,5:00 | 3 | d |
| f,b,c,d | 1:10,2:20,3:30,4:40 | 0 | f |
| f,b,c,d | 1:10,2:20,3:30,4:40 | 1 | b |
| f,b,c,d | 1:10,2:20,3:30,4:40 | 2 | c |
| f,b,c,d | 1:10,2:20,3:30,4:40 | 3 | d |
+----------+----------------------+------------------+------------+--+

如果想实现对两列听同事进行多行转换,那么用explode()函数就不能实现了,

但可以用posexplode()函数,因为该函数可以将index和数据都取出来,

使用两次posexplode并令两次取到的index相等就行了。

select id,tim,single_id,single_tim from atlasdemo.a 
lateral view posexplode(split(id,',')) t as single_id_index, single_id
lateral view posexplode(split(tim,',')) t as single_yim_index, single_tim
where single_id_index = single_yim_index;
+----------+----------------------+------------+-------------+--+
| id | tim | single_id | single_tim |
+----------+----------------------+------------+-------------+--+
| a,b,c,d | 2:00,3:00,4:00,5:00 | a | 2:00 |
| a,b,c,d | 2:00,3:00,4:00,5:00 | b | 3:00 |
| a,b,c,d | 2:00,3:00,4:00,5:00 | c | 4:00 |
| a,b,c,d | 2:00,3:00,4:00,5:00 | d | 5:00 |
| f,b,c,d | 1:10,2:20,3:30,4:40 | f | 1:10 |
| f,b,c,d | 1:10,2:20,3:30,4:40 | b | 2:20 |
| f,b,c,d | 1:10,2:20,3:30,4:40 | c | 3:30 |
| f,b,c,d | 1:10,2:20,3:30,4:40 | d | 4:40 |
+----------+----------------------+------------+-------------+--+

文章转载自大数据启示录,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论