在 SQL 中检测和合并日期范围的连续重叠

Posted

技术标签:

【中文标题】在 SQL 中检测和合并日期范围的连续重叠【英文标题】:Detect and merge date range successive overlaps in SQL 【发布时间】:2020-08-14 19:55:50 【问题描述】:

我需要检测和组合表格中的重叠日期范围,但仅在连续行中,不连续的重叠将被忽略。

CREATE TABLE konto (konto_nummer     INTEGER, start_datum DATE, end_datum DATE); 
INSERT INTO konto VALUES (1,   '2020-01-01 00:00:00.000000', '2020-01-10 00:00:00.000000');
INSERT INTO konto VALUES (1,   '2020-01-12 00:00:00.000000',    '2020-01-20 00:00:00.000000');
INSERT INTO konto VALUES (2,    '2020-01-01 00:00:00.000000',   '2020-01-10 00:00:00.000000');
INSERT INTO konto VALUES (2,    '2020-01-05 00:00:00.000000',   '2020-01-20 00:00:00.000000');
INSERT INTO konto VALUES (2,    '2020-01-15 00:00:00.000000',   '2020-01-25 00:00:00.000000');
INSERT INTO konto VALUES (2,    '2020-02-05 00:00:00.000000',   '2020-02-20 00:00:00.000000');
INSERT INTO konto VALUES (3,    '2020-01-01 00:00:00.000000',   '2020-01-25 00:00:00.000000');
INSERT INTO konto VALUES (4,    '2020-04-01 00:00:00.000000',   '2020-04-10 00:00:00.000000');
INSERT INTO konto VALUES (4,    '2020-04-05 00:00:00.000000',   '2020-04-15 00:00:00.000000');
INSERT INTO konto VALUES (4,    '2020-04-16 00:00:00.000000',   '2020-04-25 00:00:00.000000');
INSERT INTO konto VALUES (4,    '2020-04-20 00:00:00.000000',   '2020-04-30 00:00:00.000000');

相同颜色的行有连续的重叠。

我尝试了以下

    SELECT
    ROW_NUMBER () OVER (ORDER BY konto_nummer, start_datum, end_datum) AS RN,
    konto_nummer,
    start_datum,
    end_datum,
    MAX(end_datum) OVER (PARTITION BY konto_nummer ORDER BY start_datum, end_datum ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS Previousend_datum
   FROM konto;

但它也结合了不连续的重叠。

【问题讨论】:

“非连续重叠将被忽略”是什么意思? 例如:第 2,3,4 行是连续的,但不是 2 和 6,如果 2 和 6 有重叠,除非 2,3,4,5,6 都重叠,否则不计算在内。 【参考方案1】:

Gaps and Islands 有多个步骤。

首先,标记间隙

with mark as (
  select *, 
         lag(end_datum) over w
           not between start_datum and end_datum as island
    from konto
  window w as (partition by konto_nummer
                   order by start_datum, end_datum)
),

然后,给岛屿编号

 grps as (
  select *, 
         sum(coalesce(island, true)::int) over w as grpnum 
    from mark
  window w as (partition by konto_nummer
                   order by start_datum, end_datum)
)

然后按组聚合

select konto_nummer, 
       min(start_datum) as start_datum, 
       max(end_datum) as end_datum
  from grps
 group by konto_nummer, grpnum
 order by 1, 2, 3;

Working fiddle here.

【讨论】:

【参考方案2】:

当重叠可以是任意的时,我更喜欢使用累积最大值而不是lag()。这适用于这样的情况:

A ------- B -------- B --------------C-C-------A

这是:

select konto_nummer, min(start_datum), max(end_datum)
from (select k.*,
             count(*) filter (where prev_end_datum is null or prev_end_datum < start_datum) over
                (partition by konto_nummer order by start_datum) as grp
      from (select k.*,
                   max(end_datum) over (partition by konto_nummer order by start_datum range between unbounded preceding and '1 second' preceding) as prev_end_datum
            from konto k
           ) k
     ) k
group by konto_nummer, grp
order by konto_nummer, min(start_datum);

Here 是一个 dbfiddle。

【讨论】:

以上是关于在 SQL 中检测和合并日期范围的连续重叠的主要内容,如果未能解决你的问题,请参考以下文章

如何在 SQL 中选择重叠的日期范围

Impala SQL:合并具有重叠日期的行。不支持 WHERE EXISTS 和递归 CTE

PHP:选择时间范围重叠日期的重叠日期时间范围

Oracle SQL 选择具有开始和结束日期的行,如果某些重叠合并行

SQL 重叠日期范围

PL/SQL:在由开始和结束定义的重叠日期范围内查找孤岛