选择顶行,直到特定列中的值出现两次
Posted
技术标签:
【中文标题】选择顶行,直到特定列中的值出现两次【英文标题】:Select top rows until value in specific column has appeared twice 【发布时间】:2020-05-04 03:31:14 【问题描述】:我有以下查询,我试图选择所有记录,按日期排序,直到第二次找到EmailApproved = 1
。不应选择EmailApproved = 1
的第二条记录。
declare @Test table (id int, EmailApproved bit, Created datetime);
insert into @Test (id, EmailApproved, Created)
values
(1,0,'2011-03-07 03:58:58.423')
, (2,0,'2011-02-21 04:55:52.103')
, (3,0,'2011-01-29 13:24:02.103')
, (4,1,'2010-10-12 14:41:54.217')
, (5,0,'2010-10-12 14:34:15.903')
, (6,0,'2010-10-12 10:10:19.123')
, (7,1,'2010-08-27 12:07:16.073')
, (8,1,'2010-08-25 12:15:49.413')
, (9,0,'2010-08-25 12:14:51.970')
, (10,1,'2010-04-12 16:43:44.777');
select *
, case when Row1 = Row2 then 1 else 0 end Row1EqualRow2
from (
select id, EmailApproved, Created
, row_number() over (partition by EmailApproved order by Created desc) Row1
, row_number() over (order by Created desc) Row2
from @Test
) X
--where Row1 = Row2
order by Created desc;
这会产生以下结果:
id EmailApproved Created Row1 Row2 Row1EqualsRow2
1 0 2011-03-07 03:58:58.423 1 1 1
2 0 2011-02-21 04:55:52.103 2 2 1
3 0 2011-01-29 13:24:02.103 3 3 1
4 1 2010-10-12 14:41:54.217 1 4 0
5 0 2010-10-12 14:34:15.903 4 5 0
6 0 2010-10-12 10:10:19.123 5 6 0
7 1 2010-08-27 12:07:16.073 2 7 0
8 1 2010-08-25 12:15:49.413 3 8 0
9 0 2010-08-25 12:14:51.970 6 9 0
10 1 2010-04-12 16:43:44.777 4 10 0
我真正想要的是:
id EmailApproved Created Row1 Row2 Row1EqualsRow2
1 0 2011-03-07 03:58:58.423 1 1 1
2 0 2011-02-21 04:55:52.103 2 2 1
3 0 2011-01-29 13:24:02.103 3 3 1
4 1 2010-10-12 14:41:54.217 1 4 0
5 0 2010-10-12 14:34:15.903 4 5 0
6 0 2010-10-12 10:10:19.123 5 6 0
注意:Row
、Row2
和 Row1EqualsRow2
只是显示我的计算的工作列。
【问题讨论】:
【参考方案1】:步骤:
-
在所有行上创建一个行号
rn
,以防id
不按顺序排列。
创建一个行号approv_rn
,由EmailApproved
分区,这样我们就知道第二次EmailApproved = 1
的时间了
使用outer apply
查找EmailApproved = 1
的second
实例的行号
在where
子句中,过滤掉行号为>=
的所有行,在步骤3 中找到值。
如果有 1 或 0 个 EmailApproved
记录可用,则 outer apply
将返回 null,在这种情况下返回所有可用行。
with test as
(
select *,
rn = row_number() over (order by Created desc),
approv_rn = row_number() over (partition by EmailApproved
order by Created desc)
from @Test
)
select *
from test t
outer apply
(
select x.rn
from test x
where x.EmailApproved = 1
and x.approv_rn = 2
) x
where t.rn < x.rn or x.rn is null
order by t.Created desc;
【讨论】:
以上是关于选择顶行,直到特定列中的值出现两次的主要内容,如果未能解决你的问题,请参考以下文章
Pyspark:如何根据另一列中的匹配值从数组中的第一次出现中选择直到最后的值