如何为间隙和孤岛问题编写查询?

Posted

技术标签:

【中文标题】如何为间隙和孤岛问题编写查询?【英文标题】:How to write a query for a gaps and islands problem? 【发布时间】:2018-10-01 05:28:53 【问题描述】:

这是一个空白和孤岛问题。

Meter_id |Realtimeclock      |I_Y|I_B|I_X|
201010   |27-09-2018 00:00:00|1.0|2.0|3.0|
201010   |27-09-2018 00:30:00|1.0|2.0|3.0|
201010   |27-09-2018 01:00:00|1.0|2.0|3.0|
201010   |27-09-2018 01:30:00|1.0|2.0|3.0|
201010   |27-09-2018 02:00:00|1.0| 0 |3.0|
201010   |27-09-2018 02:30:00|1.0| 0 |0  |
201010   |27-09-2018 03:00:00|1.0|2.0|3.0|
201010   |27-09-2018 03:30:00|1.0|2.0|3.0|
201011   |27-09-2018 00:00:00|1.0|2.0|3.0|
201011   |27-09-2018 00:30:00|1.0|2.0|3.0|
201010   |28-09-2018 03:00:00|1.0|2.0|3.0|
201010   |28-09-2018 03:30:00|1.0|2.0|3.0|
201011   |28-09-2018 04:00:00|1.0| 0 |0  |
201011   |28-09-2018 00:00:00|1.0|2.0|3.0|
201011   |28-09-2018 00:30:00|1.0|2.0|3.0|

一种方法使用行数差异法:

select * from (
WITH cte1 AS (
        SELECT t.*, ROW_NUMBER() OVER (PARTITION BY Meter_id ORDER BY Realtimeclock) rn
        FROM yourTable t
    ),
    cte2 AS (
        SELECT t.*, ROW_NUMBER() OVER (PARTITION BY Meter_id ORDER BY Realtimeclock) rn
        FROM yourTable t
        WHERE I_B <> 0
    ),
    cte3 AS (
        SELECT t1.*,
            t1.rn - t2.rn AS diff
        FROM cte1 t1
        INNER JOIN cte2 t2
            ON t1.Meter_id = t2.Meter_id AND t1.Realtimeclock = t2.Realtimeclock
    )        
    SELECT
        Meter_id,
        MIN(Realtimeclock) AS start_time,
        MAX(Realtimeclock) AS end_time,
        COUNT(I_Y) AS I_Y,
        COUNT(I_B) AS I_B,
        COUNT(I_X) AS I_X,ROW_NUMBER() OVER (PARTITION BY meter_id ORDER BY meter_id ) AS Spell
    FROM cte3
    GROUP BY
        Meter_id,
        diff);

输出应该像 ,,请让我知道代码中需要的任何更改。

根据上表中的 I_Y、I_B、I_X 条件,我需要按日拼写作为开始时间和结束时间,这是可数的非零值。 在这里,我们看到 201010 Meter_id 的开始时间有两个法术,因为它们之间存在时间间隔。同样,它必须显示所有咒语以及日期和时间戳

Meter_id |start_time         |End_time           |I_Y|I_B|I_X|spell
201010   |27-09-2018 00:00:00|27-09-2018 01:30:00|4  |4  |4  |1
201010   |27-09-2018 03:00:00|27-09-2018 03:30:00|4  |4  |4  |2
201011   |27-09-2018 00:00:00|27-09-2018 00:30:00|2  |2  |2  |1
201010   |28-09-2018 03:00:00|27-09-2018 03:30:00|2  |2  |2  |1
201011   |28-09-2018 00:00:00|28-09-2018 00:30:00|2  |2  |2  |1

如下抛出运行时错误,

[错误]执行(35:22):ORA-01830:日期格式图片在转换整个输入字符串之前结束

嗨,蒂姆,

请调查一下。这对我有很大的帮助。

在给出 trunc(realtimeclock) 而不是 TO_DATE(realtimeclock) ..

感谢蒂姆的帮助。

【问题讨论】:

开始和结束时间是否可以绕到第二天?例如。第二天可以start_time22:00:00end_time02:00:00吗? 【参考方案1】:

您只需要对当前方法稍作修改,即可在日期上添加一个分区(除了meter_id)。然后,在最后的查询中,添加一个COUNT,它记录给定仪表和日期的法术数量。

WITH cte1 AS (
    SELECT t.*,
        ROW_NUMBER() OVER (PARTITION BY Meter_id, TO_DATE(Realtimeclock)
            ORDER BY Realtimeclock) rn
    FROM yourTable t
),
cte2 AS (
    SELECT t.*,
        ROW_NUMBER() OVER (PARTITION BY Meter_id, TO_DATE(Realtimeclock)
            ORDER BY Realtimeclock) rn
    FROM yourTable t
    WHERE I_B <> 0
),
cte3 AS (
    SELECT t1.*,
        t1.rn - t2.rn AS diff
    FROM cte1 t1
    INNER JOIN cte2 t2
        ON t1.Meter_id = t2.Meter_id AND t1.Realtimeclock = t2.Realtimeclock
)

SELECT
    Meter_id,
    MIN(Realtimeclock) AS start_time,
    MAX(Realtimeclock) AS end_time,
    COUNT(I_Y) AS I_Y,
    COUNT(I_B) AS I_B,
    COUNT(I_X) AS I_X,
    COUNT(*) OVER (PARTITION BY TO_DATE(Realtimeclock), Meter_id
        ORDER BY MIN(Realtimeclock)) AS spell
FROM cte3
GROUP BY
    Meter_id,
    TO_DATE(Realtimeclock),
    diff;

Demo

请注意,此答案假定轮班不会从一个日历日持续到下一个日历日。如果这可能发生,并且您需要对此进行解释,那么您应该告诉我们有关计算此类事件的逻辑是什么。

在 SQL Server 中再次演示,尽管上面的查询是 Oracle 代码,应该可以正常运行。

【讨论】:

以上是关于如何为间隙和孤岛问题编写查询?的主要内容,如果未能解决你的问题,请参考以下文章

基于列序列的间隙和孤岛查询/重置行数

间隙和孤岛 SQL 错误

按月、日、小时+间隙和孤岛问题分组

分组依据基于 Redshift 中的后续标志(间隙和孤岛问题)

序列中的最大出现次数(高级间隙和孤岛问题)

使用间隙和孤岛查找连续的时间/日期 - SQL/BigQuery