如何在clickhouse中按时间顺序折叠相同的值行?
Posted
技术标签:
【中文标题】如何在clickhouse中按时间顺序折叠相同的值行?【英文标题】:How can I fold same value rows with time order in clickhouse? 【发布时间】:2021-12-30 02:22:38 【问题描述】:比如状态变化及时,如何获取每次状态变化的开始时间和结束时间。
来自:状态、时间
(1, '2020-11-08 01:00:01'),
(1, '2020-11-08 01:00:02'),
(2, '2020-11-08 01:00:03'),
(2, '2020-11-08 01:00:04'),
(2, '2020-11-08 01:00:05'),
(2, '2020-11-08 01:00:06'),
(1, '2020-11-08 01:00:07'),
(1, '2020-11-08 01:00:08')
到:状态、开始时间、结束时间
1, '2020-11-08 01:00:01', '2020-11-08 01:00:02'
2, '2020-11-08 01:00:03', '2020-11-08 01:00:06'
1, '2020-11-08 01:00:07', '2020-11-08 01:00:08'
【问题讨论】:
【参考方案1】:我会看window functions来解决它:
SELECT any(status) status, min(time) start_time, max(time) end_time
FROM (
/* 4. Assign the unique id for each event group. */
SELECT status, time,
is_single_event == 1 ? rn : intDiv(toUInt32(rn - 1), 2) AS group_id
FROM (
/* 3. Number all paired rows (take into account that single events were moved to down - 'OVER (ORDER BY is_single_event, time)' - to avoid impacting the numbering of pairs). */
SELECT status, time,
start_interval_mark != 0 AND end_interval_mark != 0 AS is_single_event,
row_number() OVER (ORDER BY is_single_event, time) AS rn
FROM (
/* 1. Mark the start and end of each series. */
SELECT status, time,
groupBitXor(status) OVER (ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) start_interval_mark,
groupBitXor(status) OVER (ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) end_interval_mark
FROM (
/* Prepare the test dataset. */
SELECT data.1 status, data.2 time
FROM (
SELECT arrayJoin([
(1, '2020-11-08 01:00:01'),
(1, '2020-11-08 01:00:02'),
(2, '2020-11-08 01:00:03'),
(2, '2020-11-08 01:00:04'),
(2, '2020-11-08 01:00:05'),
(2, '2020-11-08 01:00:06'),
(1, '2020-11-08 01:00:07'),
(1, '2020-11-08 01:00:08'),
(2, '2020-11-08 01:00:09'),
(3, '2020-11-08 01:00:10'),
(3, '2020-11-08 01:00:11'),
(3, '2020-11-08 01:00:12'),
(1, '2020-11-08 01:00:13')]) data)
ORDER BY time
)
)
/* 2. Exclude all intermediate events. */
WHERE start_interval_mark != 0 OR end_interval_mark != 0
)
)
GROUP BY group_id
ORDER BY start_time
SETTINGS allow_experimental_window_functions = 1;
/*
┌─status─┬─start_time──────────┬─end_time────────────┐
│ 1 │ 2020-11-08 01:00:01 │ 2020-11-08 01:00:02 │
│ 2 │ 2020-11-08 01:00:03 │ 2020-11-08 01:00:06 │
│ 1 │ 2020-11-08 01:00:07 │ 2020-11-08 01:00:08 │
│ 2 │ 2020-11-08 01:00:09 │ 2020-11-08 01:00:09 │
│ 3 │ 2020-11-08 01:00:10 │ 2020-11-08 01:00:12 │
│ 1 │ 2020-11-08 01:00:13 │ 2020-11-08 01:00:13 │
└────────┴─────────────────────┴─────────────────────┘
*/
【讨论】:
以上是关于如何在clickhouse中按时间顺序折叠相同的值行?的主要内容,如果未能解决你的问题,请参考以下文章