T-sql 在字段更改时重置行号
Posted
技术标签:
【中文标题】T-sql 在字段更改时重置行号【英文标题】:T-sql Reset Row number on Field Change 【发布时间】:2012-11-04 12:14:14 【问题描述】:类似于我最近的一篇文章“t-sql 顺序持续时间”,但不完全相同,我想根据 x 列(在我的例子中,列“who”)的变化来重置行号。
这是返回原始(ish)数据的小样本的第一个查询:
SELECT DISTINCT chr.custno,
CAST(LEFT(CONVERT( VARCHAR(20),chr.moddate,112),10)+ ' ' + chr.modtime AS DATETIME)as moddate,
chr.who
FROM <TABLE> chr
WHERE chr.custno = 581827
AND LEFT(chr.who, 5) = 'EMSZC'
AND chr.[description] NOT LIKE 'Recalled and viewed this customer'
ORDER BY chr.custno
结果:
custno moddate who
581827 2012-11-08 08:38:00.000 EMSZC14
581827 2012-11-08 08:41:10.000 EMSZC14
581827 2012-11-08 08:53:46.000 EMSZC14
581827 2012-11-08 08:57:04.000 EMSZC14
581827 2012-11-08 08:58:35.000 EMSZC14
581827 2012-11-08 08:59:13.000 EMSZC14
581827 2012-11-08 09:00:06.000 EMSZC14
581827 2012-11-08 09:04:39.000 EMSZC49 Reset row number to 1
581827 2012-11-08 09:05:04.000 EMSZC49
581827 2012-11-08 09:06:32.000 EMSZC49
581827 2012-11-08 09:12:03.000 EMSZC49
581827 2012-11-08 09:12:38.000 EMSZC49
581827 2012-11-08 09:14:18.000 EMSZC49
581827 2012-11-08 09:17:35.000 EMSZC14 Reset row number to 1
第二步是添加行号(由于使用了DISTINCT这个词,我在第一个查询中没有这样做);所以……
WITH c1 AS (
SELECT DISTINCT chr.custno
CAST(LEFT(CONVERT( VARCHAR(20),chr.moddate,112),10)+ ' ' + chr.modtime AS DATETIME)as moddate,
chr.who
FROM <TABLE> chr
WHERE chr.custno = 581827
AND LEFT(chr.who, 5) = 'EMSZC'
AND chr.[description] NOT LIKE 'Recalled and viewed this customer'
)
SELECT ROW_NUMBER() OVER (PARTITION BY custno ORDER BY custno, moddate, who) AS RowID, custno, moddate, who
FROM c1
结果:
RowID custno moddate who
1 581827 2012-11-08 08:38:00.000 EMSZC14
2 581827 2012-11-08 08:41:10.000 EMSZC14
3 581827 2012-11-08 08:53:46.000 EMSZC14
4 581827 2012-11-08 08:57:04.000 EMSZC14
5 581827 2012-11-08 08:58:35.000 EMSZC14
6 581827 2012-11-08 08:59:13.000 EMSZC14
7 581827 2012-11-08 09:00:06.000 EMSZC14
8 581827 2012-11-08 09:04:39.000 EMSZC49 Reset row number to 1
9 581827 2012-11-08 09:05:04.000 EMSZC49
10 581827 2012-11-08 09:06:32.000 EMSZC49
11 581827 2012-11-08 09:12:03.000 EMSZC49
12 581827 2012-11-08 09:12:38.000 EMSZC49
13 581827 2012-11-08 09:14:18.000 EMSZC49
14 581827 2012-11-08 09:17:35.000 EMSZC14 Reset row number to 1
下一步是我卡住的地方:目标是在“谁”列中的值每次更改时将 RowID 重置为 1。下面的代码得到了一个“几乎就在那里”的结果(应该注意我从某个地方偷/借了这个代码,但现在我找不到该网站):
WITH c1 AS (
SELECT DISTINCT chr.custno,
CAST(LEFT(CONVERT( VARCHAR(20),chr.moddate,112),10)+ ' ' + chr.modtime AS DATETIME)as moddate,
chr.who
FROM <TABLE> chr
WHERE chr.custno = 581827
AND LEFT(chr.who, 5) = 'EMSZC'
AND chr.[description] NOT LIKE 'Recalled and viewed this customer'
)
, c1a AS (
SELECT ROW_NUMBER() OVER (PARTITION BY custno ORDER BY custno, moddate, who) AS RowID, custno, moddate, who
FROM c1
)
SELECT x.RowID - y.MinID + 1 AS Row,
x.custno, x.Touch, x.moddate, x.who
FROM (
SELECT custno, who, MIN(RowID) AS MinID
FROM c1a
GROUP BY custno, who
) AS y
INNER JOIN c1a x ON x.custno = y.custno AND x.who = y.who
结果:
Row custno moddate who
1 581827 2012-11-08 08:38:00.000 EMSZC14
2 581827 2012-11-08 08:41:10.000 EMSZC14
3 581827 2012-11-08 08:53:46.000 EMSZC14
4 581827 2012-11-08 08:57:04.000 EMSZC14
5 581827 2012-11-08 08:58:35.000 EMSZC14
6 581827 2012-11-08 08:59:13.000 EMSZC14
7 581827 2012-11-08 09:00:06.000 EMSZC14
1 581827 2012-11-08 09:04:39.000 EMSZC49 Reset row number to 1 (Hooray! It worked!)
2 581827 2012-11-08 09:05:04.000 EMSZC49
3 581827 2012-11-08 09:06:32.000 EMSZC49
4 581827 2012-11-08 09:12:03.000 EMSZC49
5 581827 2012-11-08 09:12:38.000 EMSZC49
6 581827 2012-11-08 09:14:18.000 EMSZC49
14 581827 2012-11-08 09:17:35.000 EMSZC14 Reset row number to 1 (Crappies.)
期望的结果:
Row custno moddate who
1 581827 2012-11-08 08:38:00.000 EMSZC14
2 581827 2012-11-08 08:41:10.000 EMSZC14
3 581827 2012-11-08 08:53:46.000 EMSZC14
4 581827 2012-11-08 08:57:04.000 EMSZC14
5 581827 2012-11-08 08:58:35.000 EMSZC14
6 581827 2012-11-08 08:59:13.000 EMSZC14
7 581827 2012-11-08 09:00:06.000 EMSZC14
1 581827 2012-11-08 09:04:39.000 EMSZC49 Reset row number to 1
2 581827 2012-11-08 09:05:04.000 EMSZC49
3 581827 2012-11-08 09:06:32.000 EMSZC49
4 581827 2012-11-08 09:12:03.000 EMSZC49
5 581827 2012-11-08 09:12:38.000 EMSZC49
6 581827 2012-11-08 09:14:18.000 EMSZC49
1 581827 2012-11-08 09:17:35.000 EMSZC14 Reset row number to 1
感谢任何帮助。
【问题讨论】:
【参考方案1】:如果您使用的是 SQL Server 2012,您可以使用 LAG 将值与前一行进行比较,您可以使用 SUM 和 OVER 记录更改。
with C1 as
(
select custno,
moddate,
who,
lag(who) over(order by moddate) as lag_who
from chr
),
C2 as
(
select custno,
moddate,
who,
sum(case when who = lag_who then 0 else 1 end)
over(order by moddate rows unbounded preceding) as change
from C1
)
select row_number() over(partition by change order by moddate) as RowID,
custno,
moddate,
who
from C2
SQL Fiddle
更新:
SQL Server 2005 的版本。它使用递归 CTE 和临时表作为中间存储您需要迭代的数据。
create table #tmp
(
id int primary key,
custno int not null,
moddate datetime not null,
who varchar(10) not null
);
insert into #tmp(id, custno, moddate, who)
select row_number() over(order by moddate),
custno,
moddate,
who
from chr;
with C as
(
select 1 as rowid,
T.id,
T.custno,
T.moddate,
T.who,
cast(null as varchar(10)) as lag_who
from #tmp as T
where T.id = 1
union all
select case when T.who = C.who then C.rowid + 1 else 1 end,
T.id,
T.custno,
T.moddate,
T.who,
C.who
from #tmp as T
inner join C
on T.id = C.id + 1
)
select rowid,
custno,
moddate,
who
from C
option (maxrecursion 0);
drop table #tmp;
SQL Fiddle
【讨论】:
不幸的是,我们仍在运行 2005,并计划在下个月升级到 2008R2。 LAG 函数将是解决这个问题的一个干净的方法。感谢您引起我的注意...只是迁移到 SQL Server 2012 的另一个论据。 非常酷的解决方案。我唯一关心的(实际上并没有成立)是使用 0 和 maxrecursion。这可能会导致无限循环,但在考虑我的数据结构时,可能与我所拥有的数据无关。谢谢!!! 这个LAG函数正是我需要的!谢谢!【参考方案2】:我通过使用 Rank() 成功解决了这个问题:
SELECT RANK() OVER (PARTITION BY who ORDER BY custno, moddate) AS RANK
这返回了您想要的结果。我实际上发现这篇文章试图解决同样的问题。
【讨论】:
【参考方案3】:代替:
PARTITION BY custno ORDER BY custno, moddate, who)
试试:
PARTITION BY custno, who ORDER BY custno, moddate)
【讨论】:
已经试过了(它似乎应该可以工作,不是吗?)但它仍然返回相同的结果集,最后一行编号 = 14。不过,请继续向我抛出想法。我只是缺少一些简单的步骤,我敢肯定。【参考方案4】:我能想到的唯一解决方案是使用游标(呃)并经历 RBAR 过程。这不是一个优雅的解决方案,因为光标必须读取超过 1m 行。无赖。
【讨论】:
以上是关于T-sql 在字段更改时重置行号的主要内容,如果未能解决你的问题,请参考以下文章
在 pyspark 中,基于变量字段进行分组,并为特定值添加一个计数器(当变量更改时重置)