选择顶行,直到特定列中的值出现两次

Posted

技术标签:

【中文标题】选择顶行,直到特定列中的值出现两次【英文标题】:Select top rows until value in specific column has appeared twice 【发布时间】:2020-05-04 03:31:14 【问题描述】:

我有以下查询,我试图选择所有记录,按日期排序,直到第二次找到EmailApproved = 1。不应选择EmailApproved = 1 的第二条记录。

declare @Test table (id int, EmailApproved bit, Created datetime);

insert into @Test (id, EmailApproved, Created)
values
  (1,0,'2011-03-07 03:58:58.423')
  , (2,0,'2011-02-21 04:55:52.103')
  , (3,0,'2011-01-29 13:24:02.103')
  , (4,1,'2010-10-12 14:41:54.217')
  , (5,0,'2010-10-12 14:34:15.903')
  , (6,0,'2010-10-12 10:10:19.123')
  , (7,1,'2010-08-27 12:07:16.073')
  , (8,1,'2010-08-25 12:15:49.413')
  , (9,0,'2010-08-25 12:14:51.970')
  , (10,1,'2010-04-12 16:43:44.777');

select *
  , case when Row1 = Row2 then 1 else 0 end Row1EqualRow2
from (
  select id, EmailApproved, Created
    , row_number() over (partition by EmailApproved order by Created desc) Row1
    , row_number() over (order by Created desc) Row2
  from @Test
) X
--where Row1 = Row2
order by Created desc;

这会产生以下结果:

id  EmailApproved   Created                 Row1    Row2    Row1EqualsRow2
1   0               2011-03-07 03:58:58.423 1       1       1
2   0               2011-02-21 04:55:52.103 2       2       1
3   0               2011-01-29 13:24:02.103 3       3       1
4   1               2010-10-12 14:41:54.217 1       4       0
5   0               2010-10-12 14:34:15.903 4       5       0
6   0               2010-10-12 10:10:19.123 5       6       0
7   1               2010-08-27 12:07:16.073 2       7       0
8   1               2010-08-25 12:15:49.413 3       8       0
9   0               2010-08-25 12:14:51.970 6       9       0
10  1               2010-04-12 16:43:44.777 4       10      0

我真正想要的是:

id  EmailApproved   Created                 Row1    Row2    Row1EqualsRow2
1   0               2011-03-07 03:58:58.423 1       1       1
2   0               2011-02-21 04:55:52.103 2       2       1
3   0               2011-01-29 13:24:02.103 3       3       1
4   1               2010-10-12 14:41:54.217 1       4       0
5   0               2010-10-12 14:34:15.903 4       5       0
6   0               2010-10-12 10:10:19.123 5       6       0

注意:RowRow2Row1EqualsRow2 只是显示我的计算的工作列。

【问题讨论】:

【参考方案1】:

步骤:

    在所有行上创建一个行号rn,以防id 不按顺序排列。 创建一个行号approv_rn,由EmailApproved 分区,这样我们就知道第二次EmailApproved = 1 的时间了 使用outer apply 查找EmailApproved = 1second 实例的行号 在where 子句中,过滤掉行号为>= 的所有行,在步骤3 中找到值。 如果有 1 或 0 个 EmailApproved 记录可用,则 outer apply 将返回 null,在这种情况下返回所有可用行。
with test as
(
    select  *, 
            rn         = row_number() over (order by Created desc),
            approv_rn  = row_number() over (partition by EmailApproved 
                                                order by Created desc)
    from    @Test
)
select  *
from    test t
        outer apply
        (
            select  x.rn
            from    test x
            where   x.EmailApproved = 1
            and     x.approv_rn     = 2
        ) x
where   t.rn    < x.rn or x.rn is null
order by t.Created desc;

【讨论】:

以上是关于选择顶行,直到特定列中的值出现两次的主要内容,如果未能解决你的问题,请参考以下文章

Pyspark:如何根据另一列中的匹配值从数组中的第一次出现中选择直到最后的值

编辑数字直到达到总和值

在 power BI 中提供电源查询技术/DAX 查询以填充列中的值

macbook安装了win10 怎么按osx

正则表达式4

熊猫根据索引标签选择特定列中的值[重复]