SQL - 如果满足使用多个先前列的条件,则 LAG 获取先前值
Posted
技术标签:
【中文标题】SQL - 如果满足使用多个先前列的条件,则 LAG 获取先前值【英文标题】:SQL - LAG to get previous value if condition using multiple previous columns satisfied 【发布时间】:2021-01-26 17:06:34 【问题描述】:我有一个由以下人员创建的表:
CREATE TABLE #test_table
(
id INT
,EventName VARCHAR(50)
,HomeTeam VARCHAR(25)
,Metric INT
)
INSERT INTO #test_table VALUES
(1, 'Team A vs Team B', 'Team A', 5),
(2, 'Team A vs Team B', 'Team A', 7),
(3, 'Team C vs Team D', 'Team C', 6),
(4, 'Team Z vs Team A', 'Team Z', 8),
(5, 'Team A vs Team B', 'Team A', 9),
(6, 'Team C vs Team D', 'Team C', 3),
(7, 'Team C vs Team D', 'Team C', 1),
(8, 'Team E vs Team F', 'Team E', 2)
结果:
id EventName HomeTeam Metric
------------------------------------------
1 Team A vs Team B Team A 5
2 Team A vs Team B Team A 7
3 Team C vs Team D Team C 6
4 Team Z vs Team A Team Z 8
5 Team A vs Team B Team A 9
6 Team C vs Team D Team C 3
7 Team C vs Team D Team C 1
8 Team E vs Team F Team E 2
A 想要计算一个新列 PreviousMetricN
,其中 N 可以是 1、2、3,...,它显示了 Metric
的前一个值,但前提是 HomeTeam
参与前一个事件。例如:
id EventName HomeTeam Metric PreviousMetric1 PreviousMetric2
------------------------------------------------------------------------
1 Team A vs Team B Team A 5 NULL NULL
2 Team A vs Team B Team A 7 5 NULL
3 Team C vs Team D Team C 6 NULL NULL
4 Team Z vs Team A Team Z 8 NULL NULL
5 Team A vs Team B Team A 9 8 7
6 Team C vs Team D Team C 3 6 NULL
7 Team C vs Team D Team C 1 3 6
8 Team E vs Team F Team E 2 NULL NULL
我一直在尝试LAG
的变体,在PARTITION BY
子句中使用新的分组变量,例如
LAG(Metric) OVER(Partition by (CASE WHEN CHARINDEX(HomeTeam, EventName)>0 THEN 1 ELSE 0 END) ORDER BY id)
但没有任何成功。如何做到这一点?
编辑: 我也在这里问过熊猫这个问题: Pandas shift - get previous value if multiple conditions satisfied
【问题讨论】:
为什么 id=5 的 PreviousMetric1 不应该是 7?什么是 PreviousMetric2? 请用简单的语言解释您要查找的内容。 请检查我的回答。它会解决你的问题。 【参考方案1】:我在这里看不到使用窗口函数和单次扫描表的答案。我们可以在单次扫描中执行此查询,如下所示:
假设您在另一列中有AwayTeam
。
如果你还没有这个并且你想从
EventData
解析它: 我们可以使用:SUBSTRING(EventData, CHARINDEX(' vs ', EventData) + 4)
我敦促您遵循适当的规范化并将其创建为表格中的适当列。
我们的算法是这样运行的:
-
使用
CROSS APPLY
将两个团队作为单独的行相乘(取消透视)
使用LAG
计算前一个Metric
s,按合并的Team
列进行分区
向下过滤加倍的行,以便我们只为每个原始行获得一行
SELECT id, HomeTeam, AwayTeam, Metric, Prev1, Prev2, Prev3
FROM (
SELECT *
,Prev1 = LAG(Metric, 1) OVER (PARTITION BY v.Team ORDER BY id)
,Prev2 = LAG(Metric, 2) OVER (PARTITION BY v.Team ORDER BY id)
,Prev3 = LAG(Metric, 3) OVER (PARTITION BY v.Team ORDER BY id)
-- more of these ......
FROM test_table
CROSS APPLY (VALUES (HomeTeam, 1),(AwayTeam, 0)) AS v(Team,IsHome)
) AS t
WHERE IsHome = 1
-- ORDER BY id --if necessary
重要的是,我们可以在不使用多种不同的排序、分区或排序以及不使用自连接的情况下做到这一点。只需一次扫描。
结果:
id | HomeTeam | AwayTeam | Metric | Prev1 | Prev2 | Prev3 |
---|---|---|---|---|---|---|
1 | Team A | Team B | 5 | (null) | (null) | (null) |
2 | Team A | Team B | 7 | 5 | (null) | (null) |
3 | Team C | Team D | 6 | (null) | (null) | (null) |
4 | Team Z | Team A | 8 | (null) | (null) | (null) |
5 | Team A | Team B | 9 | 8 | 7 | 5 |
6 | Team C | Team D | 3 | 6 | (null) | (null) |
7 | Team C | Team D | 1 | 3 | 6 | (null) |
8 | Team E | Team F | 2 | (null) | (null) | (null) |
【讨论】:
这是一个明智的方法,绝对是最有效的解决方案。最良好的祝愿。【参考方案2】:逻辑似乎是:
lag(metric, <n>) over (partition by hometeam order by id)
我不明白为什么需要eventName
。
【讨论】:
啊 - 我应该解释得更清楚。因为HomeTeam
可以参与游戏,但不能成为该游戏中的HomeTeam
。现在更新问题以证明这一点【参考方案3】:
使用OUTER APPLY
和相关子查询:
SELECT *
FROM test_table c
OUTER APPLY (SELECT TOP 1 PreviousMetric1 = c2.Metric
FROM test_table c2
WHERE CHARINDEX(c.HomeTeam, c2.EventName)>0
AND c.id > c2.id
ORDER BY id DESC) s1
OUTER APPLY (SELECT PreviousMetric2 = c2.Metric
FROM test_table c2
WHERE CHARINDEX(c.HomeTeam, c2.EventName)>0
AND c.id > c2.id
ORDER BY id DESC OFFSET 1 ROWS FETCH NEXT 1 ROW ONLY) s2
ORDER BY id;
db<>fiddle demo
输出:
+-----+-------------------+-----------+---------+------------------+-----------------+
| id | EventName | HomeTeam | Metric | PreviousMetric1 | PreviousMetric2 |
+-----+-------------------+-----------+---------+------------------+-----------------+
| 1 | Team A vs Team B | Team A | 5 | | |
| 2 | Team A vs Team B | Team A | 7 | 5 | |
| 3 | Team C vs Team D | Team C | 6 | | |
| 4 | Team Z vs Team A | Team Z | 8 | | |
| 5 | Team A vs Team B | Team A | 9 | 8 | 7 |
| 6 | Team C vs Team D | Team C | 3 | 6 | |
| 7 | Team C vs Team D | Team C | 1 | 3 | 6 |
| 8 | Team E vs Team F | Team E | 2 | | |
+-----+-------------------+-----------+---------+------------------+-----------------+
用PreviousMetricN
扩展是一个用OFFSET N-1 ROWS FETCH ...
添加对应的OUTER APPLY
sN 的问题。
【讨论】:
【参考方案4】:首先通过自加入和公用表表达式,我对之前所有包含主队的事件名进行了排名。我们可以从上一个匹配中获得 PreviousMetric1,我们可以使用 Lead() 窗口函数来获取 PreviousMetric2。请检查以下查询:
with cte as(
select a.id,a.eventname,a.hometeam,a.metric,b.metric PreviousMetric1,
LEAD(b.metric)over (partition by a.id order by b.id desc) PreviousMetric2,
row_number()over(partition by a.id,a.hometeam order by b.id desc) rownum
from #test_table a
left join #test_table b
on charindex(a.hometeam,b.eventname)>0 and a.id>b.id
)select id,eventname,hometeam,metric,PreviousMetric1,PreviousMetric2 from cte
where rownum=1
您还可以让 PreviousMetric3 应用 Lead() 并将 2 作为第二个参数。通过这种方式,您可以拥有任意数量的先前指标。与任何其他方法相比,它都更快。
;with cte as(
select a.id,a.eventname,a.hometeam,a.metric,b.metric PreviousMetric1,
LEAD(b.metric)over (partition by a.id order by b.id desc) PreviousMetric2,
LEAD(b.metric,2)over (partition by a.id order by b.id desc) PreviousMetric3,
row_number()over(partition by a.id,a.hometeam order by b.id desc) rownum
from #test_table a
left join #test_table b
on charindex(a.hometeam,b.eventname)>0 and a.id>b.id
)select id,eventname,hometeam,metric,PreviousMetric1,PreviousMetric2 ,PreviousMetric3 from cte
where rownum=1
【讨论】:
一个不错的解决方案,但我将不得不授予 @Charlieface 的解决方案,因为它的效率高,而且只需扫描一次表格,避免了自连接、多重排序等。 很高兴知道你得到了你想要的解决方案。是的,如果您将另一列视为客队球队,那么这是更好的解决方案。最良好的祝愿。这是一个很好的挑战。您的问题值得一票。【参考方案5】:我相信这就是您正在寻找的:
;with cte as (
select id
, eventname
, hometeam
, metric
, CASE WHEN CHARINDEX(HomeTeam, EventName)>0 THEN LAG(Metric) OVER (Partition by HomeTeam ORDER BY id) ELSE NULL END previous from #test_table
)
select * ,CASE WHEN CHARINDEX(HomeTeam, EventName)>0 THEN LAG(previous) OVER (Partition by HomeTeam ORDER BY id) ELSE NULL END previous2
from cte
order by 1
【讨论】:
以上是关于SQL - 如果满足使用多个先前列的条件,则 LAG 获取先前值的主要内容,如果未能解决你的问题,请参考以下文章
状态:整理我的面板数据集-如果先前的ID满足补充准则,则过滤符合准则的观察结果