暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

Presto SQL数组怎么使用

SQL与大数据 2021-09-26
5757

数组是Presto中的一种数据类型。关键词是Array,表示数组,例如:Array[2,3]。

一、创建数组

1. 直接创建

操作符[]用于访问数组中的一个元素,并从1开始建立索引

 select array[1,2,3][1]
 --1
复制

2.通过聚合函数

通过array_agg函数生成数组

array_agg
(x) → array<[same as input]>

Returns an array created from the input x
elements.

 with tmp AS (
 SELECT * FROM (
       VALUES
          ('a', date'2020-01-01'),
          ('a', date'2019-10-02'),
          ('a', null),
          ('b', date'2020-01-01')
  ) AS t (id, ds)
 )
 , array_test as (
 select id,array_agg(ds) as a from tmp
 group by id
 )
 select * from array_test
复制

结果是:

ida
b[2020-01-01]
a[2020-01-01, 2019-10-02, null]

二、使用数组

null值的处理

 with tmp AS (
 SELECT * FROM (
       VALUES
          ('a', date'2020-01-01'),
          ('a', date'2019-10-02'),
          ('a', null),
          ('b', date'2020-01-01'),
          ('c', null)
  ) AS t (id, ds)
 )
 , array_test as (
 select id,array_agg(ds) as a from tmp
 group by id
 )
 select *  from array_test
复制

得出来的结果

idds
a[2020-01-01, 2019-10-02, null]
b[2020-01-01]
c[null]

1. 对数组null值过滤

 with tmp AS (
 SELECT * FROM (
       VALUES
          ('a', date'2020-01-01'),
          ('a', date'2019-10-02'),
          ('a', null),
          ('b', date'2020-01-01')
  ) AS t (id, ds)
 )
 , array_test as (
 select id,filter(array_agg(ds), x -> x is not null) as a from tmp
 group by id
 )
 select *  from array_test
复制

过滤null值后结果

ida
a[2020-01-01, 2019-10-02]
b[2020-01-01]

2.求数组的长度

 select id,a,cardinality(a) as length  from array_test
复制
idalength
a[2020-01-01, 2019-10-02]2
b[2020-01-01]1

3. 数组拼接

操作符||
或者concat
函数

 select arrray[2] || array[3,4]
 -- araay[2,3,4]
 
 select concat(arrray[2], array[3,4])
 -- araay[2,3,4]
复制

和字符串拼接类似

4. 数组最值

最大值

 select id,a,array_max(a) as ma  from array_test
复制
idama
a[2020-01-01, 2019-10-02]2020/1/1
b[2020-01-01]2020/1/1

最小值

 select id,a,array_min(a) as ma  from array_test
复制

5. 增加元素

 with tmp AS (
 SELECT * FROM (
       VALUES
          ('a', date'2020-01-01'),
          ('a', date'2019-10-02'),
          ('a', null),
          ('b', date'2020-01-01')
  ) AS t (id, ds)
 )
 , array_test as (
 select id,filter(array_agg(ds), x -> x is not null) as a from tmp
 group by id
 )
 , add_array_test as (
 select id,a || date'2021-03-01' as a from array_test  
 )
 select * from add_array_test
复制

结果:

id_col1
a[2020-01-01, 2019-10-02, 2021-03-01]
b[2020-01-01, 2021-03-01]

6. 排序

 select id,a, array_sort(a) as b from add_array_test
复制

对数组进行排序

结果:

idab
b[2020-01-01, 2021-03-01][2020-01-01, 2021-03-01]
a[2020-01-01, 2019-10-02, 2021-03-01][2019-10-02, 2020-01-01, 2021-03-01]

排序以后求最值

 select id,a, element_at(array_sort(a),1) as mi, element_at(array_sort(a),-1) as ma from add_array_test
复制

结果

idamima
a[2020-01-01, 2019-10-02, 2021-03-01]2019/10/22021/3/1
b[2020-01-01, 2021-03-01]2020/1/12021/3/1

7.包含某值

 select * from add_array_test
 where contains(a, date'2019-10-02')
复制

结果

ida
a[2020-01-01, 2019-10-02, 2021-03-01]

8.数据去重

 with tmp AS (
 SELECT * FROM (
       VALUES
          ('a', date'2020-01-01'),
          ('a', date'2020-01-01'),
          ('a', date'2019-10-02'),
          ('a', null),
          ('b', date'2020-01-01')
  ) AS t (id, ds)
 )
 , array_test as (
 select id,array_distinct(filter(array_agg(ds), x -> x is not null)) as a from tmp
 group by id
 )
 select * from array_test
复制

对数据进行去重,结果

ida
a[2020-01-01, 2019-10-02]
b[2020-01-01]

9. 数据展开

 with tmp2 as (
 SELECT * FROM (
       VALUES
          (1, array[date'2021-01-04',date'2021-02-03']),
          (3, null),
          (4, array[date'2021-01-04'])
  ) AS t (id, day)
 )
 select * from tmp2
 cross join unnest(day) as t(ds)
复制

结果如下:

iddayds
1['2021-01-04','2021-02-03']2021/1/4
1['2021-01-04','2021-02-03']2021/2/3
42021/1/42021/1/4

注意:cross join 默认展开是数组不为null的,数据行数组为null,展开后没有数据

10. 利用数组行转列

 WITH test_tab AS
  (SELECT * FROM (
       VALUES
 ('cn', 'a',1,1),
 ('cn', 'b',0,1),
 ('cn', 'c',1,0),
 ('cn', 'a,b',0,1),
 ('cn', 'a,c',1,0),
 ('cn', 'a,b,c',0,0)
  ) AS t (country, index_dim,is_new_user,is_login_user)
 )
 select
  country
  , index_dim
  , is_new_user
  , is_login_user
  , n.name
   --, array[if(is_new_user=1,'new_user'),if(is_login_user=1,'login_user')] as nt
 from test_tab
 cross join unnest(filter(array[if(is_new_user=1,'new_user'),if(is_login_user=1,'login_user')],x -> x is not null)) n(name)
复制

三、常见注意事项

1. parquet格式 数组字段 可以插入null值,但是不能插入空数组array[]

 java.lang.RuntimeException: Parquet record is malformed: empty fields are illegal, the field should be ommited completely instead
 at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:64)
 at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:59)
 at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:31)
 at parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:121)
 at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:123)
 at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:42)
 at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:111)
 at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:124)
  at com.facebook.presto.hive.ParquetRecordWriterUtil$1.write(ParquetRecordWriterUtil.java:87)
 at com.facebook.presto.hive.RecordFileWriter.appendRow(RecordFileWriter.java:170)
 at com.facebook.presto.hive.RecordFileWriter.appendRows(RecordFileWriter.java:153)
 at com.facebook.presto.hive.HiveWriter.append(HiveWriter.java:80)
 at com.facebook.presto.hive.HivePageSink.writePage(HivePageSink.java:332)
 at com.facebook.presto.hive.HivePageSink.doAppend(HivePageSink.java:284)
  at com.facebook.presto.hive.HivePageSink.lambda$appendPage$2(HivePageSink.java:270)
  at com.facebook.presto.hive.authentication.HdfsAuthentication.lambda$doAs$0(HdfsAuthentication.java:24)
  at com.facebook.presto.hive.authentication.UserGroupInformationUtils.lambda$executeActionInDoAs$0(UserGroupInformationUtils.java:29)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:360)
 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1873)
 at com.facebook.presto.hive.authentication.UserGroupInformationUtils.executeActionInDoAs(UserGroupInformationUtils.java:27)
 at com.facebook.presto.hive.authentication.ImpersonatingHdfsAuthentication.doAs(ImpersonatingHdfsAuthentication.java:39)
 at com.facebook.presto.hive.authentication.HdfsAuthentication.doAs(HdfsAuthentication.java:23)
 at com.facebook.presto.hive.HdfsEnvironment.doAs(HdfsEnvironment.java:91)
 at com.facebook.presto.hive.HivePageSink.appendPage(HivePageSink.java:270)
 at com.facebook.presto.spi.connector.classloader.ClassLoaderSafeConnectorPageSink.appendPage(ClassLoaderSafeConnectorPageSink.java:66)
 at com.facebook.presto.operator.TableWriterOperator.addInput(TableWriterOperator.java:264)
 at com.facebook.presto.operator.Driver.processInternal(Driver.java:387)
  at com.facebook.presto.operator.Driver.lambda$processFor$8(Driver.java:284)
 at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:672)
 at com.facebook.presto.operator.Driver.processFor(Driver.java:277)
  at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1077)
 at com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:162)
  at com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:545)
  at com.facebook.presto.$gen.Presto_0_238_3_amzn_1____20210310_114716_1.run(Unknown Source)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
复制



文章转载自SQL与大数据,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论