在 SQL Server 2014 中用日期划分行
Posted
技术标签:
【中文标题】在 SQL Server 2014 中用日期划分行【英文标题】:Divide rows with date in SQL Server 2014 【发布时间】:2017-06-06 14:40:36 【问题描述】:我的 SQL 有问题。我有下表:
declare @t table (START_DATE datetime,
END_DATE datetime,
GROSS_SALES_PRICE decimal(10,2)
);
insert into @t
values ('2014-08-06 00:00:00.000', '2014-10-06 23:59:59.000', 29.99),
('2014-09-06 00:00:00.000', '2014-09-09 23:59:59.000', 32.99),
('2014-09-10 00:00:00.000', '2014-09-30 23:59:59.000', 32.99),
('2014-10-07 00:00:00.000', '2049-12-31 23:59:59.000', 34.99)
我想分开重叠的日期。例如,我在第一行有 START_DATE 2014-08-06 和 END_DATE 2014-10-06。我们可以看到第二行和第三行的日期都在第一行的这段时间之内。
所以我想将它们分开如下:
declare @t2 table (START_DATE datetime,
END_DATE datetime,
GROSS_SALES_PRICE decimal(10,2)
);
insert into @t2
values ('2014-08-06 00:00:00.000', '2014-09-05 23:59:59.000', 29.99),
('2014-09-06 00:00:00.000', '2014-09-09 23:59:59.000', 32.99),
('2014-09-10 00:00:00.000', '2014-09-30 23:59:59.000', 32.99),
('2014-10-01 00:00:00.000', '2014-10-06 23:59:59.000', 29.99),
('2014-10-07 00:00:00.000', '2049-12-31 23:59:59.000', 34.99)
所以第二行和第三行保持不变。第一行应该有新的 END_DATE。我们也有新的行。 GROSS_SALES_PRICE 应保持在内部期间。感谢帮助。我正在使用 SQL Server 2014
【问题讨论】:
用于添加 DDL/DML 语句 +1 您知道,您应该有 23:59:59.999。否则,您可能会错过一些数据。 @JohnPasquet 实际上是 23:59:59.997。 23:59:59.999 将四舍五入为 00:00:00.000。 由于您使用的是 SQL Server 2014,我强烈建议您可以使用DATE
或 DATETIME2(n)
而不是旧的、笨重的 DATETIME
数据类型。
我建议您使用[Closed; Open)
间隔而不是[Closed; Closed]
。换句话说,使用2014-08-06 00:00:00.000, 2014-09-06 00:00:00.000
而不是2014-08-06 00:00:00.000, 2014-09-05 23:59:59.000
。特别是因为对于 datetime
类型,59.999
将四舍五入为 00.000
,但对于 datetime2(3)
则不会。您不想依赖数据类型的此类内部细节。
【参考方案1】:
日历/日期表可以简化此操作,但我们也可以使用查询来使用common table expression 生成临时日期表。
从那里,我们可以将其解决为间隙和孤岛样式问题。使用日期表并使用outer apply()
获取start_date
和gross_sales_price
的最新值,我们可以通过使用两个row_number()
s 来识别我们想要重新聚合的组。第一个刚刚按date
排序,减去另一个按我们作为最新start_date
的值划分并按date
排序的另一个。
然后,您可以将公用表表达式 src
的结果转储到临时表并使用它进行插入/删除,或者您可以使用 merge
和 src
。
/* -- dates --*/
declare @fromdate datetime, @thrudate datetime;
select @fromdate = min(start_date), @thrudate = max(end_date) from #t;
;with n as (select n from (values(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) t(n))
, dates as (
select top (datediff(day, @fromdate, @thrudate)+1)
[Date]=convert(datetime,dateadd(day,row_number() over(order by (select 1))-1,@fromdate))
, [End_Date]=convert(datetime,dateadd(millisecond,-3,dateadd(day,row_number() over(order by (select 1)),@fromdate)))
from n as deka cross join n as hecto cross join n as kilo
cross join n as tenK cross join n as hundredK
order by [Date]
)
/* -- islands -- */
, cte as (
select
start_date = d.date
, end_date = d.end_date
, x.gross_sales_price
, grp = row_number() over (order by d.date)
- row_number() over (partition by x.start_date order by d.date)
from dates d
outer apply (
select top 1 l.start_date, l.gross_sales_price
from #t l
where d.date >= l.start_date
and d.date <= l.end_date
order by l.start_date desc
) x
)
/* -- aggregated islands -- */
, src as (
select
start_date = min(start_date)
, end_date = max(end_date)
, gross_sales_price
from cte
group by gross_sales_price, grp
)
/* -- merge -- */
merge #t with (holdlock) as target
using src as source
on target.start_date = source.start_date
and target.end_date = source.end_date
and target.gross_sales_price = source.gross_sales_price
when not matched by target
then insert (start_date, end_date, gross_sales_price)
values (start_date, end_date, gross_sales_price)
when not matched by source
then delete
output $action, inserted.*, deleted.*;
/* -- results -- */
select
start_date
, end_date
, gross_sales_price
from #t
order by start_date
rextester 演示:http://rextester.com/MFXCQQ90933
merge
输出(这个不用输出,演示用):
+---------+---------------------+---------------------+-------------------+---------------------+---------------------+-------------------+
| $action | START_DATE | END_DATE | GROSS_SALES_PRICE | START_DATE | END_DATE | GROSS_SALES_PRICE |
+---------+---------------------+---------------------+-------------------+---------------------+---------------------+-------------------+
| INSERT | 2014-10-01 00:00:00 | 2014-10-06 23:59:59 | 29.99 | NULL | NULL | NULL |
| INSERT | 2014-08-06 00:00:00 | 2014-09-05 23:59:59 | 29.99 | NULL | NULL | NULL |
| DELETE | NULL | NULL | NULL | 2014-08-06 00:00:00 | 2014-10-06 23:59:59 | 29.99 |
+---------+---------------------+---------------------+-------------------+---------------------+---------------------+-------------------+
结果:
+-------------------------+-------------------------+-------------------+
| start_date | end_date | gross_sales_price |
+-------------------------+-------------------------+-------------------+
| 2014-08-06 00:00:00.000 | 2014-09-05 23:59:59.997 | 29.99 |
| 2014-09-06 00:00:00.000 | 2014-09-09 23:59:59.997 | 32.99 |
| 2014-09-10 00:00:00.000 | 2014-09-30 23:59:59.997 | 32.99 |
| 2014-10-01 00:00:00.000 | 2014-10-06 23:59:59.997 | 29.99 |
| 2014-10-07 00:00:00.000 | 2049-12-31 23:59:59.997 | 34.99 |
+-------------------------+-------------------------+-------------------+
日历和数字表参考:
Generate a set or sequence without loops 2- Aaron Bertrand Creating a Date Table/Dimension in SQL Server 2008 - David Stein Calendar Tables - Why You Need One - David Stein Creating a date dimension or calendar table in SQL Server - Aaron Bertrandmerge
参考:
MERGE
Statement - Aaron Bertrand
UPSERT Race Condition With Merge
- Dan Guzman
An Interesting MERGE
Bug - Paul White
Can I optimize this merge
statement - Aaron Bertrand
If you are using indexed views and MERGE
, please read this! - Aaron Bertrand
The Case of the Blocking Merge
Statement (LCK_M_RS_U locks) - Kendra Little
Writing t-sql merge
statements the right way - David Stein
【讨论】:
这里的gaps-and-islands看起来有点矫枉过正。此外,您仅限于一整天的时间间隔。我添加了一个替代解决方案。【参考方案2】:除了使用datetime2
类型而不是datetime
,我建议您使用[Closed; Open)
间隔而不是[Closed; Closed]
。换句话说,使用2014-08-06 00:00:00.000, 2014-09-06 00:00:00.000
而不是2014-08-06 00:00:00.000, 2014-09-05 23:59:59.000
。具体来说,因为对于 datetime
类型,59.999
将四舍五入为 00.000
,但对于 datetime2(3)
则不会。您不想依赖数据类型的此类内部细节。
此外,[Closed; Open)
间隔在查询中更容易处理,如下所示。
主要思想是将所有开始和结束日期(边界)放在一个列表中,并带有一个标志,指示它是间隔的开始还是结束。当 flag 的总和变为 0 时,表示所有重叠区间已结束。
样本数据
我用几个重叠区间的情况扩展了你的样本数据。
declare @t table
(START_DATE datetime2(0),
END_DATE datetime2(0),
GROSS_SALES_PRICE decimal(10,2)
);
insert into @t
values
-- |------| 11
('2001-01-01 00:00:00', '2001-01-10 00:00:00', 11),
-- |------| 10
-- |------| 20
('2010-01-01 00:00:00', '2010-01-10 00:00:00', 10),
('2010-01-05 00:00:00', '2010-01-20 00:00:00', 20),
-- |----------| 30
-- |------| 40
('2010-02-01 00:00:00', '2010-02-20 00:00:00', 30),
('2010-02-05 00:00:00', '2010-02-20 00:00:00', 40),
-- |----------| 50
-- |----------| 60
('2010-03-01 00:00:00', '2010-03-20 00:00:00', 50),
('2010-03-01 00:00:00', '2010-03-20 00:00:00', 60),
-- |----------| 70
-- |------| 80
('2010-04-01 00:00:00', '2010-04-20 00:00:00', 70),
('2010-04-05 00:00:00', '2010-04-15 00:00:00', 80),
-- |-----------------------------| 29.99
-- |---------| 32.99
-- |---------| 32.99
-- |----------| 34.99
('2014-08-06 00:00:00', '2014-10-07 00:00:00', 29.99),
('2014-09-06 00:00:00', '2014-09-10 00:00:00', 32.99),
('2014-09-10 00:00:00', '2014-10-01 00:00:00', 32.99),
('2014-10-07 00:00:00', '2050-01-01 00:00:00', 34.99);
查询
WITH
CTE_Boundaries
AS
(
SELECT
START_DATE AS dt
,+1 AS Flag
,GROSS_SALES_PRICE AS Price
FROM @T
UNION ALL
SELECT
END_DATE AS dt
,-1 AS Flag
,GROSS_SALES_PRICE AS Price
FROM @T
)
,CTE_Intervals
AS
(
SELECT
dt
,Flag
,Price
,SUM(Flag) OVER (ORDER BY dt, Flag ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS SumFlag
,LEAD(dt) OVER (ORDER BY dt, Flag) AS NextDate
,LEAD(Price) OVER (ORDER BY dt, Flag) AS NextPrice
FROM CTE_Boundaries
)
SELECT
dt AS StartDate
,NextDate AS EndDate
,CASE WHEN Flag = 1 THEN Price ELSE NextPrice END AS Price
FROM CTE_Intervals
WHERE
SumFlag > 0
AND dt <> NextDate
ORDER BY StartDate
;
结果
+---------------------+---------------------+-------+
| StartDate | EndDate | Price |
+---------------------+---------------------+-------+
| 2001-01-01 00:00:00 | 2001-01-10 00:00:00 | 11.00 |
| 2010-01-01 00:00:00 | 2010-01-05 00:00:00 | 10.00 |
| 2010-01-05 00:00:00 | 2010-01-10 00:00:00 | 20.00 |
| 2010-01-10 00:00:00 | 2010-01-20 00:00:00 | 20.00 |
| 2010-02-01 00:00:00 | 2010-02-05 00:00:00 | 30.00 |
| 2010-02-05 00:00:00 | 2010-02-20 00:00:00 | 40.00 |
| 2010-03-01 00:00:00 | 2010-03-20 00:00:00 | 60.00 |
| 2010-04-01 00:00:00 | 2010-04-05 00:00:00 | 70.00 |
| 2010-04-05 00:00:00 | 2010-04-15 00:00:00 | 80.00 |
| 2010-04-15 00:00:00 | 2010-04-20 00:00:00 | 70.00 |
这是您的示例数据:
| 2014-08-06 00:00:00 | 2014-09-06 00:00:00 | 29.99 |
| 2014-09-06 00:00:00 | 2014-09-10 00:00:00 | 32.99 |
| 2014-09-10 00:00:00 | 2014-10-01 00:00:00 | 32.99 |
| 2014-10-01 00:00:00 | 2014-10-07 00:00:00 | 29.99 |
| 2014-10-07 00:00:00 | 2050-01-01 00:00:00 | 34.99 |
+---------------------+---------------------+-------+
CTE_Intervals 的中间结果
检查这些以了解查询的工作原理
+---------------------+------+-------+---------+---------------------+-----------+
| dt | Flag | Price | SumFlag | NextDate | NextPrice |
+---------------------+------+-------+---------+---------------------+-----------+
| 2001-01-01 00:00:00 | 1 | 11.00 | 1 | 2001-01-10 00:00:00 | 11.00 |
| 2001-01-10 00:00:00 | -1 | 11.00 | 0 | 2010-01-01 00:00:00 | 10.00 |
| 2010-01-01 00:00:00 | 1 | 10.00 | 1 | 2010-01-05 00:00:00 | 20.00 |
| 2010-01-05 00:00:00 | 1 | 20.00 | 2 | 2010-01-10 00:00:00 | 10.00 |
| 2010-01-10 00:00:00 | -1 | 10.00 | 1 | 2010-01-20 00:00:00 | 20.00 |
| 2010-01-20 00:00:00 | -1 | 20.00 | 0 | 2010-02-01 00:00:00 | 30.00 |
| 2010-02-01 00:00:00 | 1 | 30.00 | 1 | 2010-02-05 00:00:00 | 40.00 |
| 2010-02-05 00:00:00 | 1 | 40.00 | 2 | 2010-02-20 00:00:00 | 30.00 |
| 2010-02-20 00:00:00 | -1 | 30.00 | 1 | 2010-02-20 00:00:00 | 40.00 |
| 2010-02-20 00:00:00 | -1 | 40.00 | 0 | 2010-03-01 00:00:00 | 50.00 |
| 2010-03-01 00:00:00 | 1 | 50.00 | 1 | 2010-03-01 00:00:00 | 60.00 |
| 2010-03-01 00:00:00 | 1 | 60.00 | 2 | 2010-03-20 00:00:00 | 50.00 |
| 2010-03-20 00:00:00 | -1 | 50.00 | 1 | 2010-03-20 00:00:00 | 60.00 |
| 2010-03-20 00:00:00 | -1 | 60.00 | 0 | 2010-04-01 00:00:00 | 70.00 |
| 2010-04-01 00:00:00 | 1 | 70.00 | 1 | 2010-04-05 00:00:00 | 80.00 |
| 2010-04-05 00:00:00 | 1 | 80.00 | 2 | 2010-04-15 00:00:00 | 80.00 |
| 2010-04-15 00:00:00 | -1 | 80.00 | 1 | 2010-04-20 00:00:00 | 70.00 |
| 2010-04-20 00:00:00 | -1 | 70.00 | 0 | 2014-08-06 00:00:00 | 29.99 |
| 2014-08-06 00:00:00 | 1 | 29.99 | 1 | 2014-09-06 00:00:00 | 32.99 |
| 2014-09-06 00:00:00 | 1 | 32.99 | 2 | 2014-09-10 00:00:00 | 32.99 |
| 2014-09-10 00:00:00 | -1 | 32.99 | 1 | 2014-09-10 00:00:00 | 32.99 |
| 2014-09-10 00:00:00 | 1 | 32.99 | 2 | 2014-10-01 00:00:00 | 32.99 |
| 2014-10-01 00:00:00 | -1 | 32.99 | 1 | 2014-10-07 00:00:00 | 29.99 |
| 2014-10-07 00:00:00 | -1 | 29.99 | 0 | 2014-10-07 00:00:00 | 34.99 |
| 2014-10-07 00:00:00 | 1 | 34.99 | 1 | 2050-01-01 00:00:00 | 34.99 |
| 2050-01-01 00:00:00 | -1 | 34.99 | 0 | NULL | NULL |
+---------------------+------+-------+---------+---------------------+-----------+
【讨论】:
这非常简洁,并且在示例数据上效果很好,但是存在价格不正确的重叠区间的情况。当两个重叠的岛屿之间存在间隙时。例如:(0,9,1),(2,3,2),(6,7,3) 您的查询返回的价格是1,2,3,3,1
,而不是1,2,1,3,1
。这可以通过将价格的确定与区间的确定解耦并使用cross apply()
(或类似的东西)来获得给定区间的适用price
来轻松纠正。 rextester 演示:rextester.com/SKFJN88943
@SqlZim,是的,你是对的。我获取价格的方法是有缺陷的。通过显式查找获取价格是直截了当的,但不是最有效的方法。我依稀记得一些用窗口函数实现堆栈(FILO)的巧妙方法。本质上,随着标志的增加(新间隔开始),相应的价格被压入堆栈。当标志递减(间隔结束)时,价格从堆栈中弹出。我现在找不到例子。也许我稍后会回到这个问题。【参考方案3】:
注意:以下解决方案带有一些假设
[1] 使用 LEAD 函数 => SQL2012+
[2] 所有 DATETIME 列都是必需的 => NOT NULL
[3] 所有 DATETIME 值(跨两列)都是唯一的。
select y.*
from (
select t.ID, x.DT AS NEW_START_DATE, DATEADD(MILLISECOND, -3, LEAD(x.DT) OVER(ORDER BY x.DT ASC)) AS NEW_END_DATE
from @t as t
outer apply (
select t.START_DATE, 1
union all
select t.END_DATE, 2
) as x(DT, [TYPE])
) as y
where y.NEW_END_DATE IS NOT NULL
order by y.NEW_START_DATE
【讨论】:
注意:如果您的 DATETIME 值被截断为秒(毫秒 = 000),那么应该使用 DATEADD(MILLISECOND, -3 而不是 DATEADD(SECOND, -1 但我不得不说不安全(对这两个 DT 列没有一些 CHECK 约束)。 该代码没有返回正确的结果。它返回七行,没有gross_sales_price
,并且您有一些持续时间不到一秒的范围。 rextester.com/AFL95562【参考方案4】:
这可以通过简单的连接和联合来解决。不过有身份证更好。公用表表达式只是添加一个ID。
declare @t table(START_DATE datetime,END_DATE datetime, GROSS_SALES_PRICE
decimal(10,2));
insert into @t values
( '2014-08-06 00:00:00.000', '2014-10-06 23:59:59.000', 29.99),
( '2014-09-06 00:00:00.000', '2014-09-09 23:59:59.000', 32.99),
( '2014-09-10 00:00:00.000', '2014-09-30 23:59:59.000', 32.99),
( '2014-10-07 00:00:00.000', '2049-12-31 23:59:59.000', 34.99)
;with t_cte as
(select row_number() over( order by start_date,end_date,GROSS_SALES_PRICE) ID,*
from @t
)
select t1.start_date,min(t2.start_date),t1.GROSS_SALES_PRICE
from t_cte t1
join t_cte t2 on t1.END_DATE > t2.START_DATE and t1.END_DATE> t2.START_DATE and t1.id< t2.id
group by t1.START_DATE,t1.END_DATE,t1.GROSS_SALES_PRICE
union all
select min(t2.start_date),t1.end_date,t1.GROSS_SALES_PRICE
from t_cte t1
join t_cte t2 on t1.END_DATE > t2.START_DATE and t1.END_DATE> t2.START_DATE and t1.id< t2.id
group by t1.START_DATE,t1.END_DATE,t1.GROSS_SALES_PRICE
union all
select t1.start_date,t1.END_DATE,t1.GROSS_SALES_PRICE
from t_cte t1
left join t_cte t2 on t1.END_DATE > t2.START_DATE and t1.END_DATE> t2.START_DATE and t1.id< t2.id
where t2.id is null
order by 1,2,3
【讨论】:
您的结果有重叠,与 OP 发布的预期结果不符:rextester.com/DOP32733【参考方案5】:如何使用 Lead 从下一行中查找值:
SELECT START_DATE,
CASE
WHEN LEAD(Start_Date) OVER (ORDER BY Start_Date) < END_DATE
THEN COALESCE(DATEADD(s, -1, LEAD(Start_Date) OVER (ORDER BY Start_Date)), END_Date)
ELSE END_DATE END AS End_Date,
GROSS_SALES_PRICE
FROM @t
或者使用公用表表达式:
;WITH CTE
AS
(
SELECT Start_date,
End_Date,
LEAD(Start_Date) OVER (ORDER BY Start_Date) AS NextStartDate,
GROSS_SALES_PRICE
FROM @t
)
SELECT START_DATE,
CASE WHEN NextStartDate < END_DATE
THEN Coalesce(DATEADD(s, -1, NextStartDate), End_Date)
ELSE End_date END As End_Date,
GROSS_SALES_PRICE
FROM CTE
已更新以添加缺失的行:
;WITH CTE
AS
(
SELECT Start_date,
End_Date,
LAG(END_Date) OVER (ORDER BY Start_Date) AS PreviousEndDate,
LEAD(Start_Date) OVER (ORDER BY Start_Date) AS NextStartDate,
GROSS_SALES_PRICE
FROM @t
)
SELECT START_DATE,
CASE WHEN NextStartDate < END_DATE
THEN Coalesce(DATEADD(s, -1, NextStartDate), End_Date)
ELSE End_date END As End_Date,
GROSS_SALES_PRICE
FROM CTE
UNION ALL
SELECT DATEADD(s, 1, PreviousEndDate), DATEADD(s, -1, Start_Date), GROSS_SALES_PRICE
FROM CTE
WHERE DATEDIFF(s, PreviousEndDate,Start_Date) > 1
ORDER BY 1
【讨论】:
这解决了重叠的第一部分,但没有解决第二部分。缺少一排:从2014-10-01
到2014-10-06
,价格为29.99
。 rextester.com/XSXXPX23641
@SqlZim 更新以添加缺失的行,使用 UNION 和 Lead & Lag以上是关于在 SQL Server 2014 中用日期划分行的主要内容,如果未能解决你的问题,请参考以下文章
删除重复行但保留最早的行(按日期标准) - SQL Server