01
—
场景描述
现有用户扫描或连接 WiFi 记录表 user_wifi_log ,每一行数据表示某时刻用户扫描或连接 WiFi 的日志。

02
—
数据准备
with user_wifi_log as(select stack(9,'2024-01-01 10:01:00', '101', 'cmcc-Starbucks', 'scan' -- 扫描','2024-01-01 10:02:00', '101', 'cmcc-Starbucks', 'scan','2024-01-01 10:03:00', '101', 'cmcc-Starbucks', 'scan','2024-01-01 10:04:00', '101', 'cmcc-Starbucks', 'conn' -- 连接,'2024-01-01 10:05:00', '101', 'cmcc-Starbucks', 'conn','2024-01-01 10:06:00', '101', 'cmcc-Starbucks', 'conn','2024-01-01 11:01:00', '101', 'cmcc-Starbucks', 'conn','2024-01-01 11:02:00', '101', 'cmcc-Starbucks', 'conn','2024-01-01 11:03:00', '101', 'cmcc-Starbucks', 'conn')-- 字段:时间,用户,WiFi,状态(扫描、连接)as (dt, user_id, wifi, status))select * from user_wifi_log;

03
—
问题分析
本题本质上是断点分组的思想应用
select dt, user_id, wifi, status, lag_dt, lag_status, if(lag_status is nullor lag_dt is nullor status <> lag_statusor unix_timestamp(dt) - unix_timestamp(lag_dt) > 60 * 30, 1, 0) break_pointfrom (select *,lag(dt, 1) over (partition by user_id, wifi order by dt) as lag_dt,lag(status, 1) over (partition by user_id, wifi order by dt) as lag_statusfrom user_wifi_log) t1

第二步:构建分组标志
select dt, user_id, wifi, status, lag_dt, lag_status, sum(if(lag_status is nullor lag_dt is nullor status <> lag_statusor unix_timestamp(dt) - unix_timestamp(lag_dt) > 60 * 30, 1, 0) ) over(order by dt) break_pointfrom (select *,lag(dt, 1) over (partition by user_id, wifi order by dt) as lag_dt,lag(status, 1) over (partition by user_id, wifi order by dt) as lag_statusfrom user_wifi_log) t1

步骤三:分组内求出最大,最小时间,即为开始,结束时间
select user_id,wifi,status,min(dt) as start_time,max(dt) as end_tiemfrom(select dt, user_id, wifi, status, lag_dt, lag_status, sum(if(lag_status is nullor lag_dt is nullor status <> lag_statusor unix_timestamp(dt) - unix_timestamp(lag_dt) > 60 * 30, 1, 0)) over (order by dt) grp_flgfrom (select *,lag(dt, 1) over (partition by user_id, wifi order by dt) as lag_dt,lag(status, 1) over (partition by user_id, wifi order by dt) as lag_statusfrom user_wifi_log) t1) t2group by user_id, wifi,status, grp_flg

04
—
小结
本题本质上断点分组思想的应用,懂此技巧的同学应该解决该问题不难,与本题类似的题目有
SQL进阶技巧:用户历史最大连续签到天数问题| 断点分组问题

往期精彩
SQL进阶技巧:如何对数据进行两两组合分析?| 广告策略投放转化问题
SQL进阶技巧:如何利用Grouping_id逆向还原任意聚合结果下的维度列簇名称?| 多维分析应用
SQL进阶技巧:断点缝合问题【如何按照业务规则对相邻行数据进行合并】
SQL进阶技巧:时点值状态统计如何分析?【某时点旅店客人在住房间数量统计】
文章转载自会飞的一十六,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。





