SQL LAG 函数和分组/排序
Posted
技术标签:
【中文标题】SQL LAG 函数和分组/排序【英文标题】:SQL LAG Function and Grouping/Ordering 【发布时间】:2020-06-05 08:49:32 【问题描述】:考虑下表:
+--------------------------------------------------+
| Users |
+-------+---------+----------+----------+----------+
|acc_num| user_id | date | amount | sum |
+-------+---------+----------+----------+----------+
| a1 | u1 | 20201209 | 20 | null |
| a1 | u1 | 20201209 | 20 | |
| a1 | u1 | 20201209 | 20 | |
+-------+---------+----------+----------+----------+
| a1 | u2 | 20201208 | 30 | 50 | Correct
| a1 | u2 | 20201208 | 30 | |
| a1 | u2 | 20201208 | 30 | |
+-------+---------+----------+----------+----------+
如上所示,我想根据今天和前一天计算总和。因为我在 Postgres 上,所以我正在使用 LAG 功能。但是,即使我基于 acc_num、user_id 和 date 进行分组和排序,但是当添加新的 sum 列时,值有时是 60,有时是 50。所以有时我得到上述结果,有时我得到以下结果:
+--------------------------------------------------+
| Users |
+-------+---------+----------+----------+----------+
|acc_num| user_id | date | amount | sum |
+-------+---------+----------+----------+----------+
| a1 | u1 | 20201209 | 20 | null |
| a1 | u1 | 20201209 | 20 | |
| a1 | u1 | 20201209 | 20 | |
+-------+---------+----------+----------+----------+
| a1 | u2 | 20201208 | 30 | |
| a1 | u2 | 20201208 | 30 | 60 | Wrong
| a1 | u2 | 20201208 | 30 | |
+-------+---------+----------+----------+----------+
或者我明白了:
+--------------------------------------------------+
| Users |
+-------+---------+----------+----------+----------+
|acc_num| user_id | date | amount | sum |
+-------+---------+----------+----------+----------+
| a1 | u1 | 20201209 | 20 | null |
| a1 | u1 | 20201209 | 20 | |
| a1 | u1 | 20201209 | 20 | |
+-------+---------+----------+----------+----------+
| a1 | u2 | 20201208 | 30 | |
| a1 | u2 | 20201208 | 30 | |
| a1 | u2 | 20201208 | 30 | 60 | Wrong
+-------+---------+----------+----------+----------+
现在我的 SQL 是这样的:
SELECT acc_num, user_id, date
(CASE
WHEN (amount > 0)
THEN LAG(amount, 1) OVER (ORDER BY acc_num, user_id, date) + amount
ELSE NULL
END
) AS sum
FROM Users
GROUP BY acc_num, user_id, date
ORDER BY acc_num, user_id, date
我也试过了:
SELECT acc_num, user_id, date
(CASE
WHEN (amount > 0)
THEN LAG(amount, 1) OVER (ORDER BY date) + amount
ELSE NULL
END
) AS sum
FROM Users
GROUP BY acc_num, user_id, date
ORDER BY acc_num, user_id, date
有什么想法吗?谢谢。
【问题讨论】:
为什么会有多行具有相同的值? 你的表有主键吗? 【参考方案1】:使用lead
尝试以下操作,这里是demo。
select
acc_num,
user_id,
date,
amount,
sum(case when (nm is not null and amount <> nm) then nm + amount end) over (partition by acc_num, user_id) as sum
from
(
select
*,
lead(amount) over (order by date) as nm
from myTable
) t
输出:
| acc_num | user_id | date | amount | sum |
| ------- | ------- | -------- | ------ | --- |
| a1 | u1 | 20201209 | 20 | |
| a1 | u1 | 20201209 | 20 | |
| a1 | u1 | 20201209 | 20 | |
| a1 | u2 | 20201208 | 30 | 50 |
| a1 | u2 | 20201208 | 30 | 50 |
| a1 | u2 | 20201208 | 30 | 50 |
【讨论】:
非常感谢您。它解决了我的问题。【参考方案2】:如果我理解正确,您只需要前 天 的一个值。出于几个原因,这有点棘手,但非常重要的是,因为您没有稳定的行编号 - 也就是说,存在重复。
我建议:
select t.*,
(case when lag(date) over (order by seqnum) < date
then amount + lag(amount) over (order by seqnum)
end)
from (select t.*, row_number() over (order by date, acc_num, user_id) as seqnum
from mytable t
) t
order by seqnum;
Here 是一个 dbfiddle。
如果您希望在所有三行上都使用此功能,您也可以使用 last_value()
和 range
窗口框架:
select t.*,
amount + last_value(amount) over (order by date range between unbounded preceding and interval '1 day' preceding)
from mytable t
【讨论】:
感谢您的回复。我已经使用较早的答案实施并且它有效。无论如何感谢您的回复。 @Stacky 。 . .您的问题要求单行的值,而不是所有三行,这就是我提供答案的原因。以上是关于SQL LAG 函数和分组/排序的主要内容,如果未能解决你的问题,请参考以下文章