数组是Presto中的一种数据类型。关键词是Array,表示数组,例如:Array[2,3]。
一、创建数组
1. 直接创建
操作符[]用于访问数组中的一个元素,并从1开始建立索引
select array[1,2,3][1]
--1复制
2.通过聚合函数
通过array_agg函数生成数组
array_agg
(x) → array<[same as input]>Returns an array created from the input
x
elements.
with tmp AS (
SELECT * FROM (
VALUES
('a', date'2020-01-01'),
('a', date'2019-10-02'),
('a', null),
('b', date'2020-01-01')
) AS t (id, ds)
)
, array_test as (
select id,array_agg(ds) as a from tmp
group by id
)
select * from array_test复制
结果是:
id | a |
---|---|
b | [2020-01-01] |
a | [2020-01-01, 2019-10-02, null] |
二、使用数组
null值的处理
with tmp AS (
SELECT * FROM (
VALUES
('a', date'2020-01-01'),
('a', date'2019-10-02'),
('a', null),
('b', date'2020-01-01'),
('c', null)
) AS t (id, ds)
)
, array_test as (
select id,array_agg(ds) as a from tmp
group by id
)
select * from array_test复制
得出来的结果
id | ds |
---|---|
a | [2020-01-01, 2019-10-02, null] |
b | [2020-01-01] |
c | [null] |
1. 对数组null值过滤
with tmp AS (
SELECT * FROM (
VALUES
('a', date'2020-01-01'),
('a', date'2019-10-02'),
('a', null),
('b', date'2020-01-01')
) AS t (id, ds)
)
, array_test as (
select id,filter(array_agg(ds), x -> x is not null) as a from tmp
group by id
)
select * from array_test复制
过滤null值后结果
id | a |
---|---|
a | [2020-01-01, 2019-10-02] |
b | [2020-01-01] |
2.求数组的长度
select id,a,cardinality(a) as length from array_test复制
id | a | length |
---|---|---|
a | [2020-01-01, 2019-10-02] | 2 |
b | [2020-01-01] | 1 |
3. 数组拼接
操作符||
或者concat
函数
select arrray[2] || array[3,4]
-- araay[2,3,4]
select concat(arrray[2], array[3,4])
-- araay[2,3,4]复制
和字符串拼接类似
4. 数组最值
最大值
select id,a,array_max(a) as ma from array_test复制
id | a | ma |
---|---|---|
a | [2020-01-01, 2019-10-02] | 2020/1/1 |
b | [2020-01-01] | 2020/1/1 |
最小值
select id,a,array_min(a) as ma from array_test复制
5. 增加元素
with tmp AS (
SELECT * FROM (
VALUES
('a', date'2020-01-01'),
('a', date'2019-10-02'),
('a', null),
('b', date'2020-01-01')
) AS t (id, ds)
)
, array_test as (
select id,filter(array_agg(ds), x -> x is not null) as a from tmp
group by id
)
, add_array_test as (
select id,a || date'2021-03-01' as a from array_test
)
select * from add_array_test复制
结果:
id | _col1 |
---|---|
a | [2020-01-01, 2019-10-02, 2021-03-01] |
b | [2020-01-01, 2021-03-01] |
6. 排序
select id,a, array_sort(a) as b from add_array_test复制
对数组进行排序
结果:
id | a | b |
---|---|---|
b | [2020-01-01, 2021-03-01] | [2020-01-01, 2021-03-01] |
a | [2020-01-01, 2019-10-02, 2021-03-01] | [2019-10-02, 2020-01-01, 2021-03-01] |
排序以后求最值
select id,a, element_at(array_sort(a),1) as mi, element_at(array_sort(a),-1) as ma from add_array_test复制
结果
id | a | mi | ma |
---|---|---|---|
a | [2020-01-01, 2019-10-02, 2021-03-01] | 2019/10/2 | 2021/3/1 |
b | [2020-01-01, 2021-03-01] | 2020/1/1 | 2021/3/1 |
7.包含某值
select * from add_array_test
where contains(a, date'2019-10-02')复制
结果
id | a |
---|---|
a | [2020-01-01, 2019-10-02, 2021-03-01] |
8.数据去重
with tmp AS (
SELECT * FROM (
VALUES
('a', date'2020-01-01'),
('a', date'2020-01-01'),
('a', date'2019-10-02'),
('a', null),
('b', date'2020-01-01')
) AS t (id, ds)
)
, array_test as (
select id,array_distinct(filter(array_agg(ds), x -> x is not null)) as a from tmp
group by id
)
select * from array_test复制
对数据进行去重,结果
id | a |
---|---|
a | [2020-01-01, 2019-10-02] |
b | [2020-01-01] |
9. 数据展开
with tmp2 as (
SELECT * FROM (
VALUES
(1, array[date'2021-01-04',date'2021-02-03']),
(3, null),
(4, array[date'2021-01-04'])
) AS t (id, day)
)
select * from tmp2
cross join unnest(day) as t(ds)复制
结果如下:
id | day | ds |
---|---|---|
1 | ['2021-01-04','2021-02-03'] | 2021/1/4 |
1 | ['2021-01-04','2021-02-03'] | 2021/2/3 |
4 | 2021/1/4 | 2021/1/4 |
注意:cross join 默认展开是数组不为null的,数据行数组为null,展开后没有数据
10. 利用数组行转列
WITH test_tab AS
(SELECT * FROM (
VALUES
('cn', 'a',1,1),
('cn', 'b',0,1),
('cn', 'c',1,0),
('cn', 'a,b',0,1),
('cn', 'a,c',1,0),
('cn', 'a,b,c',0,0)
) AS t (country, index_dim,is_new_user,is_login_user)
)
select
country
, index_dim
, is_new_user
, is_login_user
, n.name
--, array[if(is_new_user=1,'new_user'),if(is_login_user=1,'login_user')] as nt
from test_tab
cross join unnest(filter(array[if(is_new_user=1,'new_user'),if(is_login_user=1,'login_user')],x -> x is not null)) n(name)复制
三、常见注意事项
1. parquet格式 数组字段 可以插入null值,但是不能插入空数组array[]
java.lang.RuntimeException: Parquet record is malformed: empty fields are illegal, the field should be ommited completely instead
at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:64)
at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:59)
at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:31)
at parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:121)
at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:123)
at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:42)
at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:111)
at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:124)
at com.facebook.presto.hive.ParquetRecordWriterUtil$1.write(ParquetRecordWriterUtil.java:87)
at com.facebook.presto.hive.RecordFileWriter.appendRow(RecordFileWriter.java:170)
at com.facebook.presto.hive.RecordFileWriter.appendRows(RecordFileWriter.java:153)
at com.facebook.presto.hive.HiveWriter.append(HiveWriter.java:80)
at com.facebook.presto.hive.HivePageSink.writePage(HivePageSink.java:332)
at com.facebook.presto.hive.HivePageSink.doAppend(HivePageSink.java:284)
at com.facebook.presto.hive.HivePageSink.lambda$appendPage$2(HivePageSink.java:270)
at com.facebook.presto.hive.authentication.HdfsAuthentication.lambda$doAs$0(HdfsAuthentication.java:24)
at com.facebook.presto.hive.authentication.UserGroupInformationUtils.lambda$executeActionInDoAs$0(UserGroupInformationUtils.java:29)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:360)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1873)
at com.facebook.presto.hive.authentication.UserGroupInformationUtils.executeActionInDoAs(UserGroupInformationUtils.java:27)
at com.facebook.presto.hive.authentication.ImpersonatingHdfsAuthentication.doAs(ImpersonatingHdfsAuthentication.java:39)
at com.facebook.presto.hive.authentication.HdfsAuthentication.doAs(HdfsAuthentication.java:23)
at com.facebook.presto.hive.HdfsEnvironment.doAs(HdfsEnvironment.java:91)
at com.facebook.presto.hive.HivePageSink.appendPage(HivePageSink.java:270)
at com.facebook.presto.spi.connector.classloader.ClassLoaderSafeConnectorPageSink.appendPage(ClassLoaderSafeConnectorPageSink.java:66)
at com.facebook.presto.operator.TableWriterOperator.addInput(TableWriterOperator.java:264)
at com.facebook.presto.operator.Driver.processInternal(Driver.java:387)
at com.facebook.presto.operator.Driver.lambda$processFor$8(Driver.java:284)
at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:672)
at com.facebook.presto.operator.Driver.processFor(Driver.java:277)
at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1077)
at com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:162)
at com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:545)
at com.facebook.presto.$gen.Presto_0_238_3_amzn_1____20210310_114716_1.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)复制
文章转载自SQL与大数据,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。
评论
相关阅读
【专家观点】罗敏:从理论到真实SQL,感受DeepSeek如何做性能优化
墨天轮编辑部
1153次阅读
2025-03-06 16:45:38
【专家有话说第五期】在不同年龄段,DBA应该怎样规划自己的职业发展?
墨天轮编辑部
1131次阅读
2025-03-13 11:40:53
2025年2月国产数据库大事记
墨天轮编辑部
868次阅读
2025-03-05 12:27:34
2025年2月国产数据库中标情况一览:GoldenDB 3500+万!达梦近千万!
通讯员
779次阅读
2025-03-06 11:40:20
2月“墨力原创作者计划”获奖名单公布
墨天轮编辑部
401次阅读
2025-03-13 14:38:19
AI的优化能力,取决于你问问题的能力!
潇湘秦
387次阅读
2025-03-11 11:18:22
优炫数据库成功应用于国家电投集团青海海南州新能源电厂!
优炫软件
322次阅读
2025-03-21 10:34:08
达梦数据与法本信息签署战略合作协议
达梦数据
267次阅读
2025-03-06 09:26:57
IBM收购数据库厂商DataStax:瞄准向量和AI搜索
深度数据云
248次阅读
2025-02-28 12:04:04
国产化+性能王炸!这套国产方案让 3.5T 数据 5 小时“无感搬家”
YMatrix
247次阅读
2025-03-13 09:51:26