Clickhouse 在“合并聚合数据”时很慢

Posted 2023-03-25

技术标签:

【中文标题】Clickhouse 在“合并聚合数据”时很慢【英文标题】：Clickhouse is slow while "Merging aggregated data" 【发布时间】：2019-12-05 06:50:11 【问题描述】：

我正在观看 Clickhouse 的表演。我注意到查询在“合并聚合数据”阶段很慢。在日志中它看起来像这样：

10:55:20.988391 [ 53 ]  <Trace> HTTPHandler: Request URI: /?query_id=ef578bae-0aa1-11ea-8948-0242ac170006&database=some_db
10:55:20.993291 [ 53 ] ef578bae-0aa1-11ea-8948-0242ac170006 <Debug> executeQuery: (from --, user: --)  --- QUERY ---
10:55:21.000491 [ 53 ] ef578bae-0aa1-11ea-8948-0242ac170006 <Debug> some_db.od (SelectExecutor): Key condition: (column 0 in 552-element set)
10:55:21.001854 [ 53 ] ef578bae-0aa1-11ea-8948-0242ac170006 <Debug> some_db.od (SelectExecutor): MinMax index condition: (column 0 in 552-element set)
10:55:21.018972 [ 53 ] ef578bae-0aa1-11ea-8948-0242ac170006 <Debug> some_db.od (SelectExecutor): Selected 3 parts by date, 3 parts by key, 7195 marks to read from 7 ranges
10:55:21.019191 [ 53 ] ef578bae-0aa1-11ea-8948-0242ac170006 <Trace> some_db.od (SelectExecutor): Reading approx. 58941440 rows with 4 streams
10:55:21.019396 [ 53 ] ef578bae-0aa1-11ea-8948-0242ac170006 <Trace> InterpreterSelectQuery: FetchColumns -> Complete
10:55:21.020418 [ 53 ] ef578bae-0aa1-11ea-8948-0242ac170006 <Debug> executeQuery: Query pipeline:
 Expression
  Expression
   ParallelAggregating
    Expression × 4
     Filter
      MergeTreeThread
10:55:21.020861 [ 53 ] ef578bae-0aa1-11ea-8948-0242ac170006 <Trace> ParallelAggregatingBlockInputStream: Aggregating
10:55:21.027488 [ 62 ] ef578bae-0aa1-11ea-8948-0242ac170006 <Trace> Aggregator: Aggregation method: keys128   
10:55:21.029127 [ 64 ] ef578bae-0aa1-11ea-8948-0242ac170006 <Trace> Aggregator: Aggregation method: keys128   
10:55:21.038888 [ 56 ] ef578bae-0aa1-11ea-8948-0242ac170006 <Trace> Aggregator: Aggregation method: keys128   
10:55:21.046746 [ 48 ] ef578bae-0aa1-11ea-8948-0242ac170006 <Trace> Aggregator: Aggregation method: keys128   
10:55:21.116165 [ 48 ] ef578bae-0aa1-11ea-8948-0242ac170006 <Trace> Aggregator: Converting aggregation data to two-level.
10:55:21.119995 [ 56 ] ef578bae-0aa1-11ea-8948-0242ac170006 <Trace> Aggregator: Converting aggregation data to two-level.
10:55:21.124843 [ 64 ] ef578bae-0aa1-11ea-8948-0242ac170006 <Trace> Aggregator: Converting aggregation data to two-level.
10:55:21.180181 [ 62 ] ef578bae-0aa1-11ea-8948-0242ac170006 <Trace> Aggregator: Converting aggregation data to two-level.
10:55:26.468352 [ 48 ] ef578bae-0aa1-11ea-8948-0242ac170006 <Debug> MemoryTracker: Current memory usage: 1.01 GiB.
10:55:27.356930 [ 53 ] ef578bae-0aa1-11ea-8948-0242ac170006 <Trace> ParallelAggregatingBlockInputStream: Aggregated. 14485433 to 2196249 rows (from 221.030 MiB) in 6.336 sec. (2286233.713 rows/sec., 34.885 MiB/sec.)
10:55:27.356989 [ 53 ] ef578bae-0aa1-11ea-8948-0242ac170006 <Trace> ParallelAggregatingBlockInputStream: Aggregated. 14929109 to 2225915 rows (from 227.800 MiB) in 6.336 sec. (2356259.030 rows/sec., 35.954 MiB/sec.)
10:55:27.357031 [ 53 ] ef578bae-0aa1-11ea-8948-0242ac170006 <Trace> ParallelAggregatingBlockInputStream: Aggregated. 14148579 to 2173827 rows (from 215.890 MiB) in 6.336 sec. (2233068.097 rows/sec., 34.074 MiB/sec.)
10:55:27.357061 [ 53 ] ef578bae-0aa1-11ea-8948-0242ac170006 <Trace> ParallelAggregatingBlockInputStream: Aggregated. 15344221 to 2260723 rows (from 234.134 MiB) in 6.336 sec. (2421776.094 rows/sec., 36.953 MiB/sec.)
10:55:27.357133 [ 53 ] ef578bae-0aa1-11ea-8948-0242ac170006 <Trace> ParallelAggregatingBlockInputStream: Total aggregated. 58907342 rows (from 898.855 MiB) in 6.336 sec. (9297336.934 rows/sec., 141.866 MiB/sec.)
10:55:27.357158 [ 53 ] ef578bae-0aa1-11ea-8948-0242ac170006 <Trace> Aggregator: Merging aggregated data       
10:55:56.117053 [ 53 ] ef578bae-0aa1-11ea-8948-0242ac170006 <Information> executeQuery: Read 58933982 rows, 1.10 GiB in 35.120 sec., 1678071 rows/sec., 32.01 MiB/sec.
10:55:56.117925 [ 53 ] ef578bae-0aa1-11ea-8948-0242ac170006 <Trace> virtual DB::MergingAndConvertingBlockInputStream::~MergingAndConvertingBlockInputStream(): Waiting for threads to finish
10:55:56.170074 [ 53 ] ef578bae-0aa1-11ea-8948-0242ac170006 <Debug> MemoryTracker: Peak memory usage (total): 1.64 GiB.
10:55:56.265958 [ 53 ] ef578bae-0aa1-11ea-8948-0242ac170006 <Debug> MemoryTracker: Peak memory usage (for query): 1.64 GiB.
10:55:56.266001 [ 53 ] ef578bae-0aa1-11ea-8948-0242ac170006 <Information> HTTPHandler: Done processing query

所以Merging aggregated data 花费了 35 秒中的 29 秒 (83%)。但我找不到任何信息，这条线甚至意味着什么。 Clickhouse 在“合并聚合数据”时做了什么？

我检查了服务器性能，但它没有耗尽内存或 CPU 时间。 CPU 也无法在iowait 模式下工作。所以我就是不明白是什么限制了 Clickhouse 的性能。有谁知道我可以如何解决缓慢的merging aggregated data？

编辑这是查询：

SELECT site_id_from as on, site_id_to as off, sum(cnt)/23 as cnt
FROM some_db.od
WHERE timestamp_start in ('2019-10-01 00:00:00', '2019-10-01 01:00:00', ... , '2019-10-31 23:00:00') -- 552 keys
GROUP BY site_id_from, site_id_to

这是表定义：

CREATE TABLE IF NOT EXISTS some_db.od (
    `timestamp_start` DateTime('Europe/Moscow'),
    `site_id_from` Int32,
    `site_id_to` Int32,
    `cnt` Float64
)
ENGINE = MergeTree() 
PARTITION BY toYYYYMM(timestamp_start) 
ORDER BY timestamp_start;

【问题讨论】：

Merging aggregated data 是二级聚合的第二步。 4 个流在第 1 步和第 2 步聚合 1/4 片，聚合器合并结果。您的查询似乎有 GROUPBY with_very_long_string。是吗？我编辑了问题并添加了查询和表定义。其实两者都非常简单。我没有说明问题，问题不是永久性的。我刚才尝试了相同的查询，它在 5.5 秒内完成。但是当很多人（最多 10 人）开始执行这样的查询时，“合并”又变慢了，我不明白哪里是瓶颈。 >5.5 sec. But when many (up to 10) 这是预期的。 CH 将所有 CPU 用于 1 次查询。所以 10 个并行查询的工作速度要慢 10 倍。这是对 CH 运行 10 个并行查询的误用。使用 MV github.com/ClickHouse/ClickHouse/issues/… 尝试预聚合数字 【参考方案1】：

尝试将表定义更改为

CREATE TABLE IF NOT EXISTS some_db.od (
    `timestamp_start` DateTime('Europe/Moscow'),
    `site_id_from` Int32,
    `site_id_to` Int32,
    `cnt` Float64,
    INDEX ts (timestamp_start) TYPE minmax GRANULARITY 1

)
ENGINE = MergeTree() 
PARTITION BY toYYYYMM(timestamp_start) 
ORDER BY site_id_from, site_id_to;

更改排序键可以减少 GROUP BY 的时间跳过索引可以减少 WHERE IN 搜索的时间 https://clickhouse.yandex/docs/en/operations/table_engines/mergetree/#table_engine-mergetree-data_skipping-indexes

【讨论】：

以上是关于Clickhouse 在“合并聚合数据”时很慢的主要内容，如果未能解决你的问题，请参考以下文章

为啥 MySQL 在使用 JOIN 而不是 WHERE 时很慢？

MappedByteBuffer 在初始运行时很慢

OFFSET ... FETCH 在高分页值时很慢

为啥 Spring Rest 服务在第一次请求时很慢？

为啥我的收藏视图在向下或向上滚动时很慢？如何解决性能问题？我已经清理了我的 UI 代码

excel表格保存时很慢，很卡