使用间隙和孤岛查找连续的时间/日期 - SQL/BigQuery

Posted

技术标签:

【中文标题】使用间隙和孤岛查找连续的时间/日期 - SQL/BigQuery【英文标题】:Using Gaps and Islands to Find Consecutive Hours/Dates- SQL/BigQuery 【发布时间】:2016-03-21 20:45:21 【问题描述】:

我在 BigQuery 中有一个如下所示的表:

Caller_Number | month  |  day| call_time
--------------|--------|-----|----------
1             |  5     |  15 | 12:56:17

我想为 BigQuery 编写一个 SQL 查询,它可以让我计算至少拨打一次电话的连续小时数(按 caller_number 排序),以及至少连续 10 小时拨打电话的连续天数(按 caller_number 排序)。我一直在查看有关间隙和岛屿的现有资源,但似乎无法弄清楚如何将其应用于连续的日期和时间。

【问题讨论】:

"sorted by caller_number" 使上述问题变得非常模糊。您应该提供更多详细信息。您可能想分享预期结果的示例。没有“按 caller_number 排序”或“按 caller_number 分区”会使故事完全不同 好的,示例结果如下所示:Caller_Number |月 |天 | Num_Consec_Hours 第二个示例结果如下所示: Caller_Number |月 | Num_Consec_Days 【参考方案1】:

以下是连续几个小时的工作示例 步骤是 1.从call_time中“提取”小时

HOUR(TIMESTAMP(CURRENT_DATE() + ' ' + call_time))

2.查找前一小时

LAG([hour]) OVER(PARTITION BY Caller_Number, [month], [day] ORDER BY [hour])

3.计算连续小时组的开始 - 1 - 开始,0 - 组继续

IFNULL(INTEGER([hour] - prev_hour > 1), 1)

4.为每个组分配组号

SUM(seq) OVER(PARTITION BY Caller_Number, [month], [day] ORDER BY [hour])

5.最后——按组号分组并计算通话次数和小时数

希望这可以为您在连续几天的持续时间结果之上实现类似逻辑提供良好的开端

SELECT Caller_Number, [month], [day], seq_group, 
  EXACT_COUNT_DISTINCT([hour]) AS hours_count, COUNT(1) AS calls_count 
FROM (
  SELECT Caller_Number, [month], [day], [hour],  
    SUM(seq) OVER(PARTITION BY Caller_Number, [month], [day] 
                  ORDER BY [hour]) AS seq_group
  FROM (
    SELECT Caller_Number, [month], [day], [hour], 
      IFNULL(INTEGER([hour] - prev_hour > 1), 1) AS seq
    FROM (
      SELECT Caller_Number, [month], [day], [hour], 
        LAG([hour]) OVER(PARTITION BY Caller_Number, [month], [day] 
                         ORDER BY [hour]) AS prev_hour
      FROM (
        SELECT Caller_Number, [month], [day], 
          HOUR(TIMESTAMP(CURRENT_DATE() + ' ' + call_time)) AS [hour] 
        FROM YourTable
      )
    )
  )
)
GROUP BY Caller_Number, [month], [day], seq_group

【讨论】:

这对我来说效果很好,只是略有不同:我的“岛屿”是基于 2 个 unix 时间戳之间的任意时间间隔,所以我的“seq”字段由:IF((IF( prev_stamp IS NOT NULL,(stamp - prev_stamp),stamp ) - (3*86400)) > 0, 1, 0 ) AS Seq

以上是关于使用间隙和孤岛查找连续的时间/日期 - SQL/BigQuery的主要内容,如果未能解决你的问题,请参考以下文章

2 列上的间隙和孤岛 - 如果 A 列连续且 B 列相同

分组依据基于 Redshift 中的后续标志(间隙和孤岛问题)

T-SQL 识别损坏的日期序列中的间隙

如何使用 android room 解决孤岛/间隙问题?

如何查找两个日期之间的连续天数

间隙和孤岛 SQL 错误