使用 SQL 检测连续的日期范围

Posted

技术标签:

【中文标题】使用 SQL 检测连续的日期范围【英文标题】:Detect consecutive dates ranges using SQL 【发布时间】:2013-12-05 14:05:39 【问题描述】:

我想填写需要开始和结束日期信息的日历对象。我有一列包含一系列日期。有些日期是连续的(相差一天),有些则不是。

InfoDate  

2013-12-04  consecutive date [StartDate]
2013-12-05  consecutive date
2013-12-06  consecutive date [EndDate]

2013-12-09                   [startDate]
2013-12-10                   [EndDate]

2014-01-01                   [startDate]
2014-01-02 
2014-01-03                   [EndDate]

2014-01-06                   [startDate]
2014-01-07                   [EndDate]

2014-01-29                   [startDate]
2014-01-30 
2014-01-31                   [EndDate]

2014-02-03                   [startDate]
2014-02-04                   [EndDate]

我想选择每个连续日期范围的开始和结束日期(块中的第一个和最后一个)。

StartDate     EndDate

2013-12-04    2013-12-06
2013-12-09    2013-12-10
2014-01-01    2014-01-03
2014-01-06    2014-01-07
2014-01-29    2014-01-31
2014-02-03    2014-02-04

我想只使用 SQL 来解决问题。

【问题讨论】:

第二个清单中的空行是什么意思?你真的需要在 SQL 中解决这个问题吗?这似乎很难用 SQL 来表达(至少在标准的 SQL 中),显而易见的算法几乎是连续的,并且可以很容易地用过程语言编写。如果真的需要 SQL,我会使用存储过程。 【参考方案1】:

不需要连接或递归 CTE。标准的 gaps-and-island 解决方案是按(值减去 row_number)分组,因为这在连续序列中是不变的。开始和结束日期只是组的 MIN() 和 MAX()。

WITH t AS (
  SELECT InfoDate d,ROW_NUMBER() OVER(ORDER BY InfoDate) i
  FROM @d
  GROUP BY InfoDate
)
SELECT MIN(d),MAX(d)
FROM t
GROUP BY DATEDIFF(day,i,d)

【讨论】:

非常聪明的解决方案。谢谢! 我觉得group by应该改成:GROUP BY DATEADD(day,-i,d) @BennyBechDk GROUP BY DATEDIFF(day,i,d)GROUP BY DATEADD(day,-i,d) 将生成相同的组。 你说“不需要使用 CTE”可能被否决了——然后使用 CTE!但是您当然可以在最终的SELECT 中将 CTE 替换为 t,所以您仍然是正确的... 嗨,TommCatt,抱歉 id 不适用于 StartDate 和 EndDate 形式的 INPUT。【参考方案2】:

给你..

;WITH CTEDATES
AS
(
    SELECT ROW_NUMBER() OVER (ORDER BY Infodate asc ) AS ROWNUMBER,infodate FROM YourTableName  

),
 CTEDATES1
AS
(
   SELECT ROWNUMBER, infodate, 1 as groupid FROM CTEDATES WHERE ROWNUMBER=1
   UNION ALL
   SELECT a.ROWNUMBER, a.infodate,case datediff(d, b.infodate,a.infodate) when 1 then b.groupid else b.groupid+1 end as gap FROM CTEDATES A INNER JOIN CTEDATES1 B ON A.ROWNUMBER-1 = B.ROWNUMBER
)

select min(mydate) as startdate, max(infodate) as enddate from CTEDATES1 group by groupid

【讨论】:

您应该使用OVER (ORDER BY Infodate) 而不是OVER (ORDER BY (SELECT 1))。另外,将min(mydate) 更改为min(infodate)。除此之外,这是一个很好的答案【参考方案3】:

我已将这些值插入到名为 #consec 的表中,然后执行以下操作:

select t1.*
,t2.infodate as binfod
into #temp1
from #consec t1
left join #consec t2 on dateadd(DAY,1,t1.infodate)=t2.infodate

select t1.*
,t2.infodate as binfod
into #temp2
from #consec t1
left join #consec t2 on dateadd(DAY,1,t2.infodate)=t1.infodate
;with cte as(
select infodate,  ROW_NUMBER() over(order by infodate asc) as seq from #temp1
where binfod is null
),
cte2 as(
select infodate, ROW_NUMBER() over(order by infodate asc) as seq from #temp2
where binfod is null
)

select t2.infodate as [start_date]
,t1.infodate as [end_date] from cte t1
left join cte2 t2 on t1.seq=t2.seq 

只要您的日期期间不重叠,那应该可以为您完成工作。

【讨论】:

【参考方案4】:

这是我的测试数据样本:

--required output
-- 01 - 03
-- 08 - 09
-- 12 - 14

DECLARE @maxRN int;
WITH #tmp AS (
                SELECT CAST('2013-01-01' AS date) DT
    UNION ALL   SELECT CAST('2013-01-02' AS date)
    UNION ALL   SELECT CAST('2013-01-03' AS date)
    UNION ALL   SELECT CAST('2013-01-05' AS date)
    UNION ALL   SELECT CAST('2013-01-08' AS date)
    UNION ALL   SELECT CAST('2013-01-09' AS date)
    UNION ALL   SELECT CAST('2013-01-12' AS date)
    UNION ALL   SELECT CAST('2013-01-13' AS date)
    UNION ALL   SELECT CAST('2013-01-14' AS date)
),
#numbered AS (
    SELECT 0 RN, CAST('1900-01-01' AS date) DT
    UNION ALL
    SELECT ROW_NUMBER() OVER (ORDER BY DT) RN, DT
    FROM #tmp
)

SELECT * INTO #tmpTable FROM #numbered;
SELECT @maxRN = MAX(RN) FROM #tmpTable;

INSERT INTO #tmpTable
SELECT @maxRN + 1, CAST('2100-01-01' AS date);

WITH #paired AS (
    SELECT 
    ROW_NUMBER() OVER(ORDER BY TStart.DT) RN, TStart.DT DTS, TEnd.DT DTE
    FROM #tmpTable TStart
    INNER JOIN #tmpTable TEnd 
    ON TStart.RN = TEnd.RN - 1
    AND DATEDIFF(dd,TStart.DT,TEnd.DT) > 1  
)

SELECT TS.DTE, TE.DTs 
FROM #paired TS
INNER JOIN #paired TE ON TS.RN = TE.RN -1
AND TS.DTE <> TE.DTs -- you could remove this filter if you want to have start and end on the same date

DROP TABLE #tmpTable

用您的实际表格替换#tmp 数据。

【讨论】:

【参考方案5】:

你可以这样做,这里是sqlfiddle

select
  min(ndate) as start_date,
  max(ndate) as end_date
from
(select
  ndate,
  dateadd(day, -row_number() over (order by ndate), ndate) as rnk
 from dates
 ) t
 group by
   rnk

【讨论】:

【参考方案6】:

另一个可以在这里工作的简单解决方案是 -

with tmp as 
(
select
datefield
, dateadd('day',-row_number() over(order by date asc),datefield) as date_group 
from table
)
select
min(datefield) as start_date
, max(datefield) as end_date 
from tmp
group by date_group

【讨论】:

【参考方案7】:
SELECT InfoDate ,
    CASE
      WHEN TRUNC(InfoDate - 1) = TRUNC(lag(InfoDate,1,InfoDate) over (order by InfoDate))
      THEN NULL
      ELSE InfoDate
    END STARTDATE,
    CASE
      WHEN TRUNC(InfoDate + 1) = TRUNC(lead(InfoDate,1,InfoDate) over (order by InfoDate))
      THEN NULL
      ELSE InfoDate
    END ENDDATE
  FROM TABLE;

【讨论】:

以上是关于使用 SQL 检测连续的日期范围的主要内容,如果未能解决你的问题,请参考以下文章

按连续日期分组,忽略 SQL 中的周末

计算连续日期,不包括 SQL 中的周末

使用 Hive 中的值计算连续的日期范围

sql语句 在一定的时间范围内 连续3天考试都满足 60分 如何写

按名称分组的连续日期范围内的最小和最大日期

标记不连续的日期范围