在窗口函数中计算运行总和

Posted 2023-03-31

技术标签:

【中文标题】在窗口函数中计算运行总和【英文标题】：Compute running sum in a window function 【发布时间】：2015-11-06 19:15:43 【问题描述】：

我对 Redshift 中的这个运行总和有疑问（使用 Postgres 8）：

select extract(month from registration_time) as month
 , extract(week from registration_time)%4+1 as week
 , extract(day from registration_time) as day
 , count(*) as count_of_users_registered
 , sum(count(*)) over (ORDER BY (1,2,3))
from loyalty.v_user
group by 1,2,3
order by 1,2,3
;

我得到的错误是：

ERROR: 42601: Aggregate window functions with an ORDER BY clause require a frame clause

【问题讨论】：

您不能在窗口定义中使用列号作为排序依据（(1,2,3) 与 1,2,3 不同 - 不要使用无用的括号）。 over (order by registration_time)应该做你想做的事在使用 order by registration_time 时仍然遇到同样的错误。我的 sum 函数本身的语法是否正确？ 【参考方案1】：

您可以在同一查询级别上对聚合函数的结果运行窗口函数。在 this 情况下使用子查询要简单得多：

SELECT *, sum(count_registered_users) OVER (ORDER BY month, week, day) AS running_sum
FROM  (
   SELECT extract(month FROM registration_time)::int     AS month
        , extract(week  FROM registration_time)::int%4+1 AS week
        , extract(day   FROM registration_time)::int     AS day
        , count(*) AS count_registered_users
   FROM   loyalty.v_user
   GROUP  BY 1, 2, 3
   ORDER  BY 1, 2, 3
   ) sub;

我还修复了表达式计算week 的语法。 extract() 返回 double precision，但模运算符 % 不接受 double precision 数字。我把这三个都投给了integer。

与@a_horse commented 一样，您不能在窗口函数的ORDER BY 子句中使用位置引用（与查询的ORDER BY 子句不同）。

但是，您不能在此查询中使用 over (order by registration_time)，因为您是按 month、week、day 分组的。 registration_time 既不是聚合的，也不是在GROUP BY 子句中所需的。在查询评估的那个阶段，您无法再访问该列。

您可以重复ORDER BY 子句中前三个SELECT 项的表达式以使其工作：

SELECT extract(month FROM registration_time)::int     AS month
     , extract(week  FROM registration_time)::int%4+1 AS week
     , extract(day   FROM registration_time)::int     AS day
     , count(*) AS count_registered_users
     , sum(count(*)) OVER (ORDER BY 
              extract(month FROM registration_time)::int
            , extract(week  FROM registration_time)::int%4+1
            , extract(day   FROM registration_time)::int) AS running_sum
FROM   loyalty.v_user
GROUP  BY 1, 2, 3
ORDER  BY 1, 2, 3;

但这似乎相当嘈杂。（不过性能会很好。）

顺便说一句：我确实想知道week%4+1 背后的目的......整个查询可能更简单。

相关：

Get the distinct sum of a joined table column PostgreSQL: running count of rows for a query 'by minute'

【讨论】：

是的，您的回答和建议很有意义。在我尝试在 order by month, week, day 子句之后包含 rows unbounded preceding 之前，我遇到了同样的错误，并且它起作用了。因此，对于将来遇到此问题的任何人，看起来都需要rows unbounded preceding 或您想要的任何选项。同样，AWS 文档对此并不清楚，因为这是在 Redshift 上 @simplycoding：Redshift 是您应该在问题开始时提到的，因为它不是 Postgres。派生自它，但有本质的不同。上面的代码在 Postgres 中按原样工作，quote、The default framing option is RANGE UNBOUNDED PRECEDING。

以上是关于在窗口函数中计算运行总和的主要内容，如果未能解决你的问题，请参考以下文章