如何在clickhouse中按时间顺序折叠相同的值行?

Posted

技术标签:

【中文标题】如何在clickhouse中按时间顺序折叠相同的值行?【英文标题】:How can I fold same value rows with time order in clickhouse? 【发布时间】:2021-12-30 02:22:38 【问题描述】:

比如状态变化及时,如何获取每次状态变化的开始时间和结束时间。

来自:状态、时间

(1, '2020-11-08 01:00:01'), 
(1, '2020-11-08 01:00:02'), 
(2, '2020-11-08 01:00:03'), 
(2, '2020-11-08 01:00:04'), 
(2, '2020-11-08 01:00:05'), 
(2, '2020-11-08 01:00:06'), 
(1, '2020-11-08 01:00:07'), 
(1, '2020-11-08 01:00:08')

到:状态、开始时间、结束时间

1, '2020-11-08 01:00:01', '2020-11-08 01:00:02'
2, '2020-11-08 01:00:03', '2020-11-08 01:00:06'
1, '2020-11-08 01:00:07', '2020-11-08 01:00:08'

【问题讨论】:

【参考方案1】:

我会看window functions来解决它:

SELECT any(status) status, min(time) start_time, max(time) end_time
FROM (
  /* 4. Assign the unique id for each event group. */
  SELECT status, time, 
    is_single_event == 1 ? rn : intDiv(toUInt32(rn - 1), 2) AS group_id
  FROM (
    /* 3. Number all paired rows (take into account that single events were moved to down - 'OVER (ORDER BY is_single_event, time)' - to avoid impacting the numbering of pairs). */
    SELECT status, time, 
      start_interval_mark != 0 AND end_interval_mark != 0 AS is_single_event, 
      row_number() OVER (ORDER BY is_single_event, time) AS rn
    FROM (
      /* 1. Mark the start and end of each series. */
      SELECT status, time, 
        groupBitXor(status) OVER (ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) start_interval_mark,
        groupBitXor(status) OVER (ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) end_interval_mark
      FROM (
        /* Prepare the test dataset. */
        SELECT data.1 status, data.2 time
        FROM (
          SELECT arrayJoin([
            (1, '2020-11-08 01:00:01'), 
            (1, '2020-11-08 01:00:02'), 
            (2, '2020-11-08 01:00:03'), 
            (2, '2020-11-08 01:00:04'), 
            (2, '2020-11-08 01:00:05'), 
            (2, '2020-11-08 01:00:06'), 
            (1, '2020-11-08 01:00:07'), 
            (1, '2020-11-08 01:00:08'),
            (2, '2020-11-08 01:00:09'),
            (3, '2020-11-08 01:00:10'),
            (3, '2020-11-08 01:00:11'),
            (3, '2020-11-08 01:00:12'),
            (1, '2020-11-08 01:00:13')]) data)
          ORDER BY time
        )  
      )
    /* 2. Exclude all intermediate events. */
    WHERE start_interval_mark != 0 OR end_interval_mark != 0
    )
  )
GROUP BY group_id
ORDER BY start_time
SETTINGS allow_experimental_window_functions = 1;

/*
┌─status─┬─start_time──────────┬─end_time────────────┐
│      1 │ 2020-11-08 01:00:01 │ 2020-11-08 01:00:02 │
│      2 │ 2020-11-08 01:00:03 │ 2020-11-08 01:00:06 │
│      1 │ 2020-11-08 01:00:07 │ 2020-11-08 01:00:08 │
│      2 │ 2020-11-08 01:00:09 │ 2020-11-08 01:00:09 │
│      3 │ 2020-11-08 01:00:10 │ 2020-11-08 01:00:12 │
│      1 │ 2020-11-08 01:00:13 │ 2020-11-08 01:00:13 │
└────────┴─────────────────────┴─────────────────────┘
*/

【讨论】:

以上是关于如何在clickhouse中按时间顺序折叠相同的值行?的主要内容,如果未能解决你的问题,请参考以下文章

如何在所有行中按顺序更新组号

如何在多图中按排序顺序打印键的值

按顺序删除/折叠连续的重复值

如何让我的表格在移动视图中按行折叠

Clickhouse:按与表存储相同的顺序排列内存耗尽

在数据框中按组折叠文本[重复]