过滤掉 n 天内的记录
Posted
技术标签:
【中文标题】过滤掉 n 天内的记录【英文标题】:Filter out records within n days 【发布时间】:2018-08-24 22:12:23 【问题描述】:我不知道如何命名这个挑战..
我想标记(以便稍后过滤)某些记录,这些记录由 TypeID 列分区,它们在第一个记录的日期值的 n 天内(在本例中为 3)内观察分区数据集。这很简单,但在同一个分区集中,如果 在 3 天限制之后出现更多记录 - 该组的新“第一个”记录应该开始一个新的链以标记 3 天内的所有后续记录.等等……
我在此屏幕截图中说明了所需的输出,我想在其中标记/过滤掉标有黄色的行。保留所有其他行。
我已经用窗口函数等进行了喷涂和祈祷,但似乎找不到一个优雅的解决方案。你将如何使用 T-SQL 解决这个问题?
sqlfiddle 没有响应 sql-server atm,所以在这里发布 DDL 代码:
DROP TABLE IF EXISTS [dbo].[testTable];
CREATE TABLE [dbo].[testTable](
[RowID] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY,
[CustID] [int] NULL,
[TransTypeID] [int] NULL,
[Date] [date] NULL,
)
GO
SET IDENTITY_INSERT [dbo].[testTable] ON
GO
INSERT [dbo].[testTable] ([RowID], [CustID], [TransTypeID], [Date]) VALUES (1, 9362, 1, CAST(N'2018-01-11' AS Date))
GO
INSERT [dbo].[testTable] ([RowID], [CustID], [TransTypeID], [Date]) VALUES (2, 9362, 1, CAST(N'2018-01-22' AS Date))
GO
INSERT [dbo].[testTable] ([RowID], [CustID], [TransTypeID], [Date]) VALUES (3, 9362, 2, CAST(N'2018-01-04' AS Date))
GO
INSERT [dbo].[testTable] ([RowID], [CustID], [TransTypeID], [Date]) VALUES (4, 9362, 2, CAST(N'2018-01-07' AS Date))
GO
INSERT [dbo].[testTable] ([RowID], [CustID], [TransTypeID], [Date]) VALUES (5, 9362, 2, CAST(N'2018-01-09' AS Date))
GO
INSERT [dbo].[testTable] ([RowID], [CustID], [TransTypeID], [Date]) VALUES (6, 9362, 2, CAST(N'2018-01-22' AS Date))
GO
INSERT [dbo].[testTable] ([RowID], [CustID], [TransTypeID], [Date]) VALUES (7, 9362, 2, CAST(N'2018-01-23' AS Date))
GO
INSERT [dbo].[testTable] ([RowID], [CustID], [TransTypeID], [Date]) VALUES (8, 9362, 2, CAST(N'2018-01-24' AS Date))
GO
INSERT [dbo].[testTable] ([RowID], [CustID], [TransTypeID], [Date]) VALUES (9, 9362, 2, CAST(N'2018-01-26' AS Date))
GO
INSERT [dbo].[testTable] ([RowID], [CustID], [TransTypeID], [Date]) VALUES (10, 9362, 3, CAST(N'2018-01-22' AS Date))
GO
INSERT [dbo].[testTable] ([RowID], [CustID], [TransTypeID], [Date]) VALUES (11, 9362, 5, CAST(N'2018-01-01' AS Date))
GO
INSERT [dbo].[testTable] ([RowID], [CustID], [TransTypeID], [Date]) VALUES (12, 9362, 5, CAST(N'2018-01-02' AS Date))
GO
INSERT [dbo].[testTable] ([RowID], [CustID], [TransTypeID], [Date]) VALUES (13, 9362, 5, CAST(N'2018-01-02' AS Date))
GO
INSERT [dbo].[testTable] ([RowID], [CustID], [TransTypeID], [Date]) VALUES (14, 9362, 5, CAST(N'2018-01-04' AS Date))
GO
INSERT [dbo].[testTable] ([RowID], [CustID], [TransTypeID], [Date]) VALUES (15, 9362, 5, CAST(N'2018-01-07' AS Date))
GO
INSERT [dbo].[testTable] ([RowID], [CustID], [TransTypeID], [Date]) VALUES (16, 9362, 5, CAST(N'2018-01-17' AS Date))
GO
INSERT [dbo].[testTable] ([RowID], [CustID], [TransTypeID], [Date]) VALUES (17, 9362, 5, CAST(N'2018-02-08' AS Date))
GO
INSERT [dbo].[testTable] ([RowID], [CustID], [TransTypeID], [Date]) VALUES (18, 9362, 5, CAST(N'2018-02-18' AS Date))
GO
SET IDENTITY_INSERT [dbo].[testTable] OFF
GO
【问题讨论】:
【参考方案1】:使用递归 CTE 应该可以做到这一点。首先SELECT
组内具有最短日期的所有行。这可以使用row_number()
来完成。然后递归UNION ALL
组中日期大于结果中已经存在的最大日期加上 3 天的最小日期的行,从而跳过 3 天。同样row_number()
可用于此,dateadd()
用于日期算术。
WITH [cte]
AS
(
SELECT [x].[RowID],
[x].[CustID],
[x].[TransTypeId],
[x].[Date]
FROM (SELECT [testTable].[RowID],
[testTable].[CustID],
[testTable].[TransTypeId],
[testTable].[Date],
row_number() OVER (PARTITION BY [testTable].[CustId],
[testTable].[TransTypeID]
ORDER BY [testTable].[Date]) [row#]
FROM [dbo].[testTable]) [x]
WHERE [x].[row#] = 1
UNION ALL
SELECT [x].[RowID],
[x].[CustID],
[x].[TransTypeId],
[x].[Date]
FROM (SELECT [testTable].[RowID],
[testTable].[CustID],
[testTable].[TransTypeId],
[testTable].[Date],
row_number() OVER (PARTITION BY [testTable].[CustId],
[testTable].[TransTypeID]
ORDER BY [testTable].[Date]) [row#]
FROM [dbo].[testTable]
INNER JOIN [cte]
ON [cte].[CustId] = [testTable].[CustId]
AND [cte].[TransTypeId] = [testTable].[TransTypeID]
AND dateadd(day, 3, [cte].[Date]) < [testTable].[Date]) [x]
WHERE [x].[row#] = 1
)
SELECT *
FROM [cte]
ORDER BY [cte].[CustID],
[cte].[TransTypeID],
[cte].[Date];
结果:
RowID | CustID | TransTypeId | Date
----: | -----: | ----------: | :------------------
1 | 9362 | 1 | 11/01/2018 00:00:00
2 | 9362 | 1 | 22/01/2018 00:00:00
3 | 9362 | 2 | 04/01/2018 00:00:00
5 | 9362 | 2 | 09/01/2018 00:00:00
6 | 9362 | 2 | 22/01/2018 00:00:00
9 | 9362 | 2 | 26/01/2018 00:00:00
10 | 9362 | 3 | 22/01/2018 00:00:00
11 | 9362 | 5 | 01/01/2018 00:00:00
15 | 9362 | 5 | 07/01/2018 00:00:00
16 | 9362 | 5 | 17/01/2018 00:00:00
17 | 9362 | 5 | 08/02/2018 00:00:00
18 | 9362 | 5 | 18/02/2018 00:00:00
db<>fiddle
(我假设这些组不仅由[TransTypeID]
定义,还由[CustID]
定义。这对我来说并不是很清楚。如果我的假设错误,请从PARTITION BY
子句中删除[CustID]
。)
【讨论】:
是的,CustID 也是该组的一部分。它完美无缺!我想标记记录,而不是立即过滤。所以我将它插入到一个临时表中,将它加入到原始表中,并将没有匹配的行标记为 0,否则为 1,等等。很好!以上是关于过滤掉 n 天内的记录的主要内容,如果未能解决你的问题,请参考以下文章