Hive 查询中的前向滚动平均值

Posted

技术标签:

【中文标题】Hive 查询中的前向滚动平均值【英文标题】:Forward Rolling Average in Hive Query 【发布时间】:2019-11-29 03:13:06 【问题描述】:

我想计算基于“4 天窗口”的滚动平均值。请在下面找到详细信息

Create table stock(day int, time String, cost float);

Insert into stock values(1,"8 AM",3.1);
Insert into stock values(1,"9 AM",3.2);
Insert into stock values(1,"10 AM",4.5);
Insert into stock values(1,"11 AM",5.5);
Insert into stock values(2,"8 AM",5.1);
Insert into stock values(2,"9 AM",2.2);
Insert into stock values(2,"10 AM",1.5);
Insert into stock values(2,"11 AM",6.5);
Insert into stock values(3,"8 AM",8.1);
Insert into stock values(3,"9 AM",3.2);
Insert into stock values(3,"10 AM",2.5);
Insert into stock values(3,"11 AM",4.5);
Insert into stock values(4,"8 AM",3.1);
Insert into stock values(4,"9 AM",1.2);
Insert into stock values(4,"10 AM",0.5);
Insert into stock values(4,"11 AM",1.5); 

我写了下面的查询

select day, cost,sum(cost) over (order by day range between current row and 4 Following), avg(cost) over (order by day range between current row and 4 Following) 
from stock

如您所见,我每天获得 4 条记录,我需要计算 4 天窗口的滚动平均值。为此,我编写了上面的窗口查询,因为我每天只有 4 天的数据,包含 4 条记录,所以第一天的总和将是所有 16 条记录的总和。基于此,第一条记录的总和为 56.20,这是正确的,平均值应为 56.20/4(因为有 4 天),但它是 56.20/16,因为总共有 16 条记录。我该如何解决这个问题的平均部分?

谢谢 拉杰

【问题讨论】:

【参考方案1】:

这是你想要的吗?

select t.*,
       avg(cost) over (order by day range between current row and 4 following)
from t;

编辑:

你似乎想要的是:

select t.*,
       (sum(cost) over (order by day range between current row and 3 following) /
        count(distinct day) over (order by day range between current row and 3 following)
       )
from t;

但是,Hive 不支持这一点。您可以为此目的使用子查询:

select t.*,
       (sum(cost) over (order by day range between current row and 3 following) /
        sum(case when seqnum = 1 then 1 else 0 end) over (order by day range between current row and 3 following)
       )
from (select t.*
             row_number() over (partition by day order by time) as seqnum
      from t
     )t

【讨论】:

您好,您可以在查询中看到。我已经尝试过了。问题是,它会做 56/16 (4*4),但我希望它是 56/4。我需要每天的平均值不是基于每天的所有记录(4 * 4)。有什么想法吗? 感谢 Gordon 的快速回复,你真的拯救了我的一天!

以上是关于Hive 查询中的前向滚动平均值的主要内容,如果未能解决你的问题,请参考以下文章

在同一查询中返回分组值的平均值以及该值的前 n% 的平均值?

SQL语句练习查询平均成绩最高的前3名同学

将子查询转换为单个查询 Hive

SQL Server 中 7 天滚动平均值的 SQL 查询

提高计算 MS-Access 中大型数据集 7 天滚动平均值的查询的性能

HIVE中不同列的平均函数