暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

Oracle 如何创建带有分析功能的缺失记录

askTom 2017-07-14
286

问题描述

嗨,AskTom团队,

我在找出执行以下操作的查询时遇到了一些麻烦:

我有一个由外部系统填充的暂存表。此表存储有关一天内商品销售多少的信息。如果某件物品在一天内没有售出任何东西,外部系统将创建一条售出数量 = 0的记录。

-- small sample for staging table (sales until day 10)
create table sales_staging as 
select * from (
-- item 1:
select 1 item, 1  day_of_month, 20 qty_sold from dual union all
select 1 item, 2  day_of_month,  0 qty_sold from dual union all
select 1 item, 3  day_of_month, 10 qty_sold from dual union all
select 1 item, 4  day_of_month, 20 qty_sold from dual union all
select 1 item, 5  day_of_month, 20 qty_sold from dual union all
select 1 item, 6  day_of_month, 20 qty_sold from dual union all
select 1 item, 7  day_of_month, 35 qty_sold from dual union all
select 1 item, 8  day_of_month, 40 qty_sold from dual union all
select 1 item, 9  day_of_month, 20 qty_sold from dual union all
select 1 item, 10 day_of_month,  2 qty_sold from dual union all
-- item 2:
select 2 item, 1  day_of_month, 10 qty_sold from dual union all
select 2 item, 2  day_of_month, 21 qty_sold from dual union all
select 2 item, 3  day_of_month, 33 qty_sold from dual union all
select 2 item, 7  day_of_month, 20 qty_sold from dual union all -- days 4,5,6 are missing! so 20 is the total sales qty for days 4,5,6 and 7
select 2 item, 8  day_of_month, 14 qty_sold from dual union all
select 2 item, 9  day_of_month, 20 qty_sold from dual union all
select 2 item, 10 day_of_month, 20 qty_sold from dual
);


但是,有时外部系统有时不会发送特定项目的销售信息一天或多天,并且当创建该项目的新记录时,该记录会这样做,其中qty_sold列包含丢失天数和新记录标识的日期期间的总销售量。

我必须创建一个查询,该查询检测提到的丢失天数,并为这些天数创建额外的记录。

这些记录的qty_sold应划分为丢失的天数和包含汇总销售数量的记录 (在项目2的小样本天数4,5、6和7中)

因此,考虑到上面的示例,查询应该返回sales_staging表中存在的所有记录
有3条额外的记录:
2件,每月4天,5(= 20/4) qty_sold
2件,每月5天,5(= 20/4) qty_sold
2件,每月6天,5(= 20/4) qty_sold
随着第7天记录的qty_sold也更新:
2件,每月6天,5(= 20/4) qty_sold


您将如何使用分析函数创建此查询,并知道暂存表包含数千条记录,而性能是一个关键问题?


提前感谢您的帮助和建议!


专家解答

有几个步骤要采取。让我们把它们分解。

首先,您需要一个包含所有这可能是表或您最喜欢的行生成技术。例如:

select rownum dy from dual connect by level <= 10


接下来,对于源中的每个项目,找到当前行和前一行之间的天数。您稍后需要将数量除以丢失天数。你可以用滞后来做到这一点,比如:

day_of_month - lag(day_of_month, 1, 1) over (
  partition by item order by day_of_month
) missing_days


注意: 最终1是默认值,因此这假设您始终从每月1号开始...

然后你需要把这两个连接在一起。但不仅仅是任何外部连接。哦,不。你会想要一个partitioned外部连接。按项目执行此操作意味着您拥有所有项目的所有时间:

with rws as (
  select rownum dy from dual connect by level <= 10
), stg as (
  select ss.*,
         day_of_month - lag(day_of_month, 1, 1) over (
           partition by item order by day_of_month
         ) missing_days
  from   sales_staging ss
)
  select ss.*, rws.dy
  from   rws
  left join stg ss partition by (item) 
  on     dy = day_of_month;

      ITEM DAY_OF_MONTH   QTY_SOLD MISSING_DAYS         DY
---------- ------------ ---------- ------------ ----------
         1            1         20            0          1
         1            2          0            1          2
         1            3         10            1          3
         1            4         20            1          4
         1            5         20            1          5
         1            6         20            1          6
         1            7         35            1          7
         1            8         40            1          8
         1            9         20            1          9
         1           10          2            1         10
         2            1         10            0          1
         2            2         21            1          2
         2            3         33            1          3
         2                                               4
         2                                               5
         2                                               6
         2            7         20            4          7
         2            8         14            1          8
         2            9         20            1          9
         2           10         20            1         10


现在,您需要 “填写” 丢失的数量。

输入first_value。

您可以使用它在结果集中进行展望,以查找下一个非null值。要做到这一点,你需要一个窗口子句。这将从您的当前行到集的末尾 (对于项目) 工作。对前面计算的数量和差值执行此操作:

first_value(qty_sold) ignore nulls over (
  partition by item order by dy
  rows between current row and unbounded following
)


只需确保包含 “忽略null” 子句。否则你仍然会有那些讨厌的缺失值!

现在,您需要做的就是将这些数量除以它们之间的天数 (确保将集合中的开始日映射为1,以避免除以零!)把它们放在一起,你会得到:

create table sales_staging as 
select * from (
-- item 1:
select 1 item, 1  day_of_month, 20 qty_sold from dual union all
select 1 item, 2  day_of_month,  0 qty_sold from dual union all
select 1 item, 3  day_of_month, 10 qty_sold from dual union all
select 1 item, 4  day_of_month, 20 qty_sold from dual union all
select 1 item, 5  day_of_month, 20 qty_sold from dual union all
select 1 item, 6  day_of_month, 20 qty_sold from dual union all
select 1 item, 7  day_of_month, 35 qty_sold from dual union all
select 1 item, 8  day_of_month, 40 qty_sold from dual union all
select 1 item, 9  day_of_month, 20 qty_sold from dual union all
select 1 item, 10 day_of_month,  2 qty_sold from dual union all
-- item 2:
select 2 item, 1  day_of_month, 10 qty_sold from dual union all
select 2 item, 2  day_of_month, 21 qty_sold from dual union all
select 2 item, 3  day_of_month, 33 qty_sold from dual union all
select 2 item, 7  day_of_month, 20 qty_sold from dual union all -- days 4,5,6 are missing! so 20 is the total sales qty for days 4,5,6 and 7
select 2 item, 8  day_of_month, 14 qty_sold from dual union all
select 2 item, 9  day_of_month, 20 qty_sold from dual union all
select 2 item, 10 day_of_month, 20 qty_sold from dual
);

with rws as (
  select rownum dy from dual connect by level <= 10
), stg as (
  select ss.*,
         day_of_month - lag(day_of_month, 1, 1) over (
           partition by item order by day_of_month
         ) missing_days
  from   sales_staging ss
), qty as (
  select ss.*, rws.dy,
         first_value(qty_sold) ignore nulls over (
           partition by item order by dy
           rows between current row and unbounded following
         ) nv,
         first_value(missing_days) ignore nulls over (
           partition by item order by dy
           rows between current row and unbounded following
         ) mdays
  from   rws
  left join stg ss partition by (item) 
  on     dy = day_of_month
)
  select q.* ,
         nv / case when mdays = 0 then 1 else mdays end qty
  from   qty q
  order  by item , dy;

      ITEM DAY_OF_MONTH   QTY_SOLD MISSING_DAYS         DY         NV      MDAYS        QTY
---------- ------------ ---------- ------------ ---------- ---------- ---------- ----------
         1            1         20            0          1         20          0         20
         1            2          0            1          2          0          1          0
         1            3         10            1          3         10          1         10
         1            4         20            1          4         20          1         20
         1            5         20            1          5         20          1         20
         1            6         20            1          6         20          1         20
         1            7         35            1          7         35          1         35
         1            8         40            1          8         40          1         40
         1            9         20            1          9         20          1         20
         1           10          2            1         10          2          1          2
         2            1         10            0          1         10          0         10
         2            2         21            1          2         21          1         21
         2            3         33            1          3         33          1         33
         2                                               4         20          4          5
         2                                               5         20          4          5
         2                                               6         20          4          5
         2            7         20            4          7         20          4          5
         2            8         14            1          8         14          1         14
         2            9         20            1          9         20          1         20
         2           10         20            1         10         20          1         20


请注意,您需要对此进行调整,以便可以跨越一个月的界限。最简单的方法是将那些day_of_month值转换为真实日期...

有关分区外部联接的更多信息:
http://www.oracle-developer.net/display.php?id=312
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论