暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

Hive-数据分析-posexplode计算日期序列

趣说大数据 2021-09-17
1352
01

需求分析


将某个时间段之间的日期补全,如列出2021-01-25到2021-11-26的所有月份。(包含两端)。
input: 2021-01-25,2021-11-26
output:
02021-01-01
1
2021-02-01
22021-03-01
32021-04-01
42021-05-01
52021-06-01
6
2021-07-01
7
2021-08-01
8
2021-09-01
9
2021-10-01
10
2021-11-01

02

Hive函数


  • space(n) - returns n spaces (输入一个整数n返回n个空格)

select space(10);
  • posexplode(a) - behaves like explode for arrays, but includes the position of items in the original array (类似于爆破函数,返回的结果为index,item)

  • months_between(date1, date2, roundOff) - returns number of months between dates date1 and date2 (返回两个日期之间的月份差值)

select months_between('2021-12-01','2021-01-01');
  • add_months(start_date, num_months) - Returns the date that is num_months after start_date  (在指定的日期基础上加上指定月数)

select add_months('2021-01-01',1);
  • trunc(date, fmt) 

       date with the time portion of the day truncated to the unit specified by the format model fmt. If you omit fmt, then date is truncated to the nearest day. It currently only supports 'MONTH'/'MON'/'MM', 'QUARTER'/'Q' and 'YEAR'/'YYYY'/'YY' as format (格式化日期)

select trunc('2021-01-25','MM')//2021-01-01 
select trunc('2021-02-25','YY'); //2021-01-01
  • abs(x) - returns the absolute value of x 返回x的绝对值


03

Hive实现


  • 数据准备,日期格式化

select 
trunc('2021-01-25','MM') as start_month,
trunc('2021-11-26','MM') as end_month,
abs (cast(months_between(trunc('2021-01-25','MM'),trunc('2021-11-26','MM')) as int )) as num_month
  • 需求实现

select
index_num,
add_months(start_month,index_num) as month
from
(
select
trunc('2021-01-25','MM') as start_month,
trunc('2021-11-26','MM') as end_month,
abs (cast(months_between(trunc('2021-01-25','MM'),trunc('2021-11-26','MM')) as int )) as num_month
)tmp
lateral view posexplode(split(space(num_month)," ")) k as index_num,data
  • 运行测试


文章转载自趣说大数据,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论