当数据增长时,带有分页的 Cassandra 查询第二个索引变得更慢
Posted
技术标签:
【中文标题】当数据增长时,带有分页的 Cassandra 查询第二个索引变得更慢【英文标题】:Cassandra query 2nd index with pagination become slower when data grow 【发布时间】:2018-09-12 17:30:48 【问题描述】:当我使用分页查询二级索引时,随着数据的增长,查询会变慢。 我想用pagination,不管你的数据有多大,查询一页都需要同样的时间。真的吗?为什么我的查询变慢了?
我的简化表是
CREATE TABLE closed_executions (
domain_id uuid,
workflow_id text,
start_time timestamp,
workflow_type_name text,
PRIMARY KEY ((domain_id), start_time)
) WITH CLUSTERING ORDER BY (start_time DESC)
AND COMPACTION =
'class': 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'
AND GC_GRACE_SECONDS = 172800;
我创建了一个二级索引
CREATE INDEX closed_by_type ON closed_executions (workflow_type_name);
我使用以下 CQL 查询
SELECT workflow_id, start_time, workflow_type_name
FROM closed_executions
WHERE domain_id = ?
AND start_time >= ?
AND start_time <= ?
AND workflow_type_name = ?
和代码
query := v.session.Query(templateGetClosedWorkflowExecutionsByType,
request.DomainUUID,
common.UnixNanoToCQLTimestamp(request.EarliestStartTime),
common.UnixNanoToCQLTimestamp(request.LatestStartTime),
request.WorkflowTypeName).Consistency(gocql.One)
iter := query.PageSize(request.PageSize).PageState(request.NextPageToken).Iter()
// PageSize is 10, but could be thousand
环境:
MacBook Pro 卡桑德拉:3.11.0 GoCql:github.com/gocql/gocql master观察: 10K 行,秒内 10 万行,约 3 秒 1M 行,~17 秒
调试日志:
INFO [ScheduledTasks:1] 2018-09-11 16:29:48,349 NoSpamLogger.java:91 - Some operations were slow, details available at debug level (debug.log)
DEBUG [ScheduledTasks:1] 2018-09-11 16:29:48,357 MonitoringTask.java:173 - 1 operations were slow in the last 5005 msecs:
<SELECT * FROM cadence_visibility.closed_executions WHERE workflow_type_name = code.uber.internal/devexp/cadence-bench/load/basic.stressWorkflowExecute AND token(domain_id, domain_partition) >= token(d3138e78-abe7-48a0-adb9-8c466a9bb3fa, 0) AND token(domain_id, domain_partition) <= token(d3138e78-abe7-48a0-adb9-8c466a9bb3fa, 0) AND start_time >= 2018-09-11 16:29-0700 AND start_time <= 1969-12-31 16:00-0800 LIMIT 10>, time 2747 msec - slow timeout 500 msec
DEBUG [COMMIT-LOG-ALLOCATOR] 2018-09-11 16:31:47,774 AbstractCommitLogSegmentManager.java:107 - No segments in reserve; creating a fresh one
DEBUG [ScheduledTasks:1] 2018-09-11 16:40:22,922 ColumnFamilyStore.java:899 - Enqueuing flush of size_estimates: 23.997MiB (2%) on-heap, 0.000KiB (0%) off-heap
相关参考(我的问题没有答案):
https://lists.apache.org/thread.html/%3CCAAiKoBidknHVOz8oQQmncZFZHdFiDfW6HTs63vxXCOhisQYZgg@mail.gmail.com%3E https://www.datastax.com/dev/blog/cassandra-native-secondary-index-deep-dive https://docs.datastax.com/en/developer/java-driver/3.2/manual/paging/-- 编辑 表统计返回
Total number of tables: 105
----------------
Keyspace : cadence_visibility
Read Count: 19
Read Latency: 0.5125263157894736 ms.
Write Count: 3220964
Write Latency: 0.04900822269357869 ms.
Pending Flushes: 0
Table: closed_executions
SSTable count: 1
SSTables in each level: [1, 0, 0, 0, 0, 0, 0, 0, 0]
Space used (live): 20.3 MiB
Space used (total): 20.3 MiB
Space used by snapshots (total): 0 bytes
Off heap memory used (total): 6.35 KiB
SSTable Compression Ratio: 0.40192660515179696
Number of keys (estimate): 3
Memtable cell count: 28667
Memtable data size: 7.35 MiB
Memtable off heap memory used: 0 bytes
Memtable switch count: 9
Local read count: 9
Local read latency: NaN ms
Local write count: 327024
Local write latency: NaN ms
Pending flushes: 0
Percent repaired: 0.0
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used: 16 bytes
Bloom filter off heap memory used: 8 bytes
Index summary off heap memory used: 38 bytes
Compression metadata off heap memory used: 6.3 KiB
Compacted partition minimum bytes: 150
Compacted partition maximum bytes: 62479625
Compacted partition mean bytes: 31239902
Average live cells per slice (last five minutes): NaN
Maximum live cells per slice (last five minutes): 0
Average tombstones per slice (last five minutes): NaN
Maximum tombstones per slice (last five minutes): 0
Dropped Mutations: 0 bytes
----------------
【问题讨论】:
你能包含你的索引的cfstats吗?很可能是您对workflow_type_name
的基数太低。真的,你应该避免一起使用二级索引,除非有强烈的一致性要求,没有它就无法满足。分散聚集查询永远不会很好地扩展。
感谢@ChrisLohfink 的回复,不确定 tablestats 是否是您提到的 cfstats,但我将其粘贴在原始帖子的末尾。我目前对查询的理解是,Cassandra 按WHERE domain_id = ? AND start_time >= ? AND start_time <= ?
从分区加载所有数据,比如 100k 行;然后按本地索引workflow_type_name = ?
过滤,例如 50k 行;然后分页从 50k 行返回 pagesize 10 行。我的理解是真的吗?如果是这样,为什么分页没有更早开始,所以按本地索引步骤过滤只返回 10 行?
【参考方案1】:
为什么分页不能像主表一样缩放? 您的二级索引中的数据是分散的 分页只会应用逻辑 直到它到达页码 因为您的数据不是按时间聚集的 你仍然需要筛选很多很多行 例如,在您找到前 10 个之前。
查询跟踪确实显示分页播放在很晚的阶段。
为什么二级索引很慢? 首先,Cassandra 读取索引表以检索所有匹配行的主键,并且对于每一个匹配行,它将读取原始表以获取数据。它是已知的具有低基数索引的反模式。 (参考https://www.datastax.com/dev/blog/cassandra-native-secondary-index-deep-dive)
【讨论】:
以上是关于当数据增长时,带有分页的 Cassandra 查询第二个索引变得更慢的主要内容,如果未能解决你的问题,请参考以下文章