Impala SQL:合并具有重叠日期的行。不支持 WHERE EXISTS 和递归 CTE

Posted

技术标签:

【中文标题】Impala SQL:合并具有重叠日期的行。不支持 WHERE EXISTS 和递归 CTE【英文标题】:Impala SQL: Merging rows with overlapping dates. WHERE EXISTS and recursive CTE not supported 【发布时间】:2017-04-24 15:13:42 【问题描述】:

我正在尝试在 Impala SQL 的表中合并具有重叠日期间隔的行。但是,Impala 不支持我发现的解决此问题的解决方案,例如。 WHERE EXISTS 和递归 CTE。

如何在 Impala 中为此编写查询?

    Table: @T
    ID  StartDate   EndDate
    1   20170101    20170201
    2   20170101    20170401
    3   20170505    20170531    
    4   20170530    20170531
    5   20170530    20170831
    6   20171001    20171005
    7   20171101    20171225
    8   20171105    20171110

    Required Output:
    StartDate   EndDate
    20170101    20170401
    20170505    20170831
    20171001    20171005

Impala 不支持我尝试实现的示例:

    SELECT 
           s1.StartDate,
           MIN(t1.EndDate) AS EndDate
    FROM @T s1 
    INNER JOIN @T t1 ON s1.StartDate <= t1.EndDate
      AND NOT EXISTS(SELECT * FROM @T t2 
             WHERE t1.EndDate >= t2.StartDate AND t1.EndDate < t2.EndDate) 
    WHERE NOT EXISTS(SELECT * FROM @T s2 
                     WHERE s1.StartDate > s2.StartDate AND s1.StartDate <= s2.EndDate) 
    GROUP BY s1.StartDate 
    ORDER BY s1.StartDate 

类似问题:

Merge overlapping date intervals

Eliminate and reduce overlapping date ranges

https://gerireshef.wordpress.com/2010/05/02/packing-date-intervals/

https://www.sqlservercentral.com/Forums/Topic826031-8-1.aspx

【问题讨论】:

请务必使用 ISO 日期格式:YYYY-MM-DD 【参考方案1】:
select  min(StartDate)  as StartDate
       ,max(EndDate)    as EndDate

from   (select  StartDate,EndDate
               ,count (is_gap) over
                (
                    order by    StartDate,ID
                )   as range_id

        from   (select  ID,StartDate,EndDate
                       ,case 
                            when    max (EndDate) over
                                    (
                                        order by    StartDate,ID
                                        rows        between unbounded preceding 
                                                    and     1 preceding
                                    ) < StartDate
                            then    true
                        end as is_gap

                from    t
                ) t
        ) t

group by    range_id

order by    StartDate
;

+------------+------------+
| startdate  | enddate    |
+------------+------------+
| 2017-01-01 | 2017-04-01 |
| 2017-05-05 | 2017-08-31 |
| 2017-10-01 | 2017-10-05 |
| 2017-11-01 | 2017-12-25 |
+------------+------------+

【讨论】:

这完全符合我的要求。非常感谢!你是最棒的。救生员。

以上是关于Impala SQL:合并具有重叠日期的行。不支持 WHERE EXISTS 和递归 CTE的主要内容,如果未能解决你的问题,请参考以下文章

在 SQL 中检测和合并日期范围的连续重叠

SQL/Impala:将多个查询(具有不同的 where 子句)合并为一个

SQL Server 2014 合并重叠日期范围

如何根据条件对sql中的行进行分组

SQL选择具有最大和最小日期的行

SQL BigQuery - 插入具有不同日期范围的行