ES实战Rally 离线使用实现自定义track压测
Posted 顧棟
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了ES实战Rally 离线使用实现自定义track压测相关的知识,希望对你有一定的参考价值。
Rally 离线使用实现自定义track压测
文章目录
离线安装
离线安装Rally 安装之后,默认配置不动
在另外的机器上单独部署的ES集群,或者说是,自主修改的ES集群,不使用官方的发布包。
术语表
track是对一个或多个具有特定文档语料库的基准测试场景的描述。例如,它定义了所涉及的索引、数据文件以及所调用的操作。用esrally list tracks列出可用的轨迹。尽管Rally开箱即有一些轨道,但你通常应该根据自己的数据创建自己的轨道。
基准测试场景的基础组成数据,如对集群读写的操作步骤和索引使用的数据。
challenge
一个挑战描述了一个基准情景,例如,用4个客户端以最大吞吐量索引文档,同时从另外两个客户端发出术语和短语查询,每个客户端的速度限制为每秒10次查询。它总是在一个轨道的背景下被指定。通过使用esrally list tracks列出相应的轨道来查看可用的挑战。
基准测试场景的不同测试的维度数据,如并发度的不同
car
car是Elasticsearch集群的特定配置,例如开箱即用的配置、具有特定堆大小的配置或自定义日志配置。用esrally list cars列出可用的车。
基准测试场景的ES集群的配置数据
telemetry
遥测在Rally中用于收集关于车的指标,例如CPU使用率或索引大小。
基准测试场景中的测试效果度量指标数据
race
一场比赛是对Rally二进制的一次调用。另一个名字是一个 “基准试验”。在一场比赛中,Rally用给定的赛车在赛道上运行一次挑战。
一次基准测试的执行。
tournament
一个锦标赛是两个比赛的比较。你可以使用Rally的锦标赛模式来进行。
对比基准测试的不同执行
自定义track-tutorial
创建自定义的track目录/opt/software/esrally/rally-tracks/tutorial
自定义的track目录中需要三个文件:
-
documents.json :结构化的索引的数据
数据来源于官网测试数据 allCountries.zip (around 300MB),解压后会有一个文件
allCountries.txt
。然后通过Python脚本将数据转测json。python脚本
import json cols = (("geonameid", "int", True), ("name", "string", True), ("asciiname", "string", False), ("alternatenames", "string", False), ("latitude", "double", True), ("longitude", "double", True), ("feature_class", "string", False), ("feature_code", "string", False), ("country_code", "string", True), ("cc2", "string", False), ("admin1_code", "string", False), ("admin2_code", "string", False), ("admin3_code", "string", False), ("admin4_code", "string", False), ("population", "long", True), ("elevation", "int", False), ("dem", "string", False), ("timezone", "string", False)) def main(): with open("allCountries.txt", "rt", encoding="UTF-8") as f: for line in f: tup = line.strip().split("\\t") record = for i in range(len(cols)): name, type, include = cols[i] if tup[i] != "" and include: if type in ("int", "long"): record[name] = int(tup[i]) elif type == "double": record[name] = float(tup[i]) elif type == "string": record[name] = tup[i] print(json.dumps(record, ensure_ascii=False)) if __name__ == "__main__": main()
脚本调用:
python3 toJSON.py > documents.json
"geonameid": 2986043, "name": "Pic de Font Blanca", "latitude": 42.64991, "longitude": 1.53335, "country_code": "AD", "population": 0 "geonameid": 2994701, "name": "Roc Mélé", "latitude": 42.58765, "longitude": 1.74028, "country_code": "AD", "population": 0 "geonameid": 3007683, "name": "Pic des Langounelles", "latitude": 42.61203, "longitude": 1.47364, "country_code": "AD", "population": 0 "geonameid": 3017832, "name": "Pic de les Abelletes", "latitude": 42.52535, "longitude": 1.73343, "country_code": "AD", "population": 0 "geonameid": 3017833, "name": "Estany de les Abelletes", "latitude": 42.52915, "longitude": 1.73362, "country_code": "AD", "population": 0 "geonameid": 3023203, "name": "Port Vieux de la Coume d’Ose", "latitude": 42.62568, "longitude": 1.61823, "country_code": "AD", "population": 0 "geonameid": 3029315, "name": "Port de la Cabanette", "latitude": 42.6, "longitude": 1.73333, "country_code": "AD", "population": 0 "geonameid": 3034945, "name": "Port Dret", "latitude": 42.60172, "longitude": 1.45562, "country_code": "AD", "population": 0 "geonameid": 3038814, "name": "Costa de Xurius", "latitude": 42.50692, "longitude": 1.47569, "country_code": "AD", "population": 0 "geonameid": 3038815, "name": "Font de la Xona", "latitude": 42.55003, "longitude": 1.44986, "country_code": "AD", "population": 0 "geonameid": 3038816, "name": "Xixerella", "latitude": 42.55327, "longitude": 1.48736, "country_code": "AD", "population": 0 "geonameid": 3038818, "name": "Riu Xic", "latitude": 42.57165, "longitude": 1.67554, "country_code": "AD", "population": 0 "geonameid": 3038819, "name": "Pas del Xic", "latitude": 42.49766, "longitude": 1.57597, "country_code": "AD", "population": 0 "geonameid": 3038820, "name": "Roc del Xeig", "latitude": 42.56068, "longitude": 1.4898, "country_code": "AD", "population": 0
-
index.json:索引的结构mappings和settings
"settings": "index.number_of_replicas": 0 , "mappings": "_doc": "dynamic": "strict", "properties": "geonameid": "type": "long" , "name": "type": "text" , "latitude": "type": "double" , "longitude": "type": "double" , "country_code": "type": "text" , "population": "type": "long"
-
track.json:定义了压测的索引说明和压测场景调度
"version": 2, "description": "Tutorial benchmark for Rally", "indices": [ "name": "geonames", "body": "index.json" ], "corpora": [ "name": "rally-tutorial", "documents": [ "source-file": "documents.json", "document-count": 11658903, "uncompressed-bytes": 1544799789 ] ], "schedule": [ "operation": "operation-type": "delete-index" , "operation": "operation-type": "create-index" , "operation": "operation-type": "cluster-health", "request-params": "wait_for_status": "green" , "retry-until-success": true , "operation": "operation-type": "bulk", "bulk-size": 5000 , "warmup-time-period": 120, "clients": 8 , "operation": "operation-type": "force-merge" , "operation": "name": "query-match-all", "operation-type": "search", "body": "query": "match_all": , "clients": 8, "warmup-iterations": 1000, "iterations": 1000, "target-throughput": 100 ]
在documents
属性下的数字需要用来验证完整性和提供进度报告。source-file
数据源文件,document-count
文档中记录的数量,uncompressed-bytes
未压缩的文档总大小。官网的数据文档可能会发生变化,可以通过命令来确认实际的数据情况,用wc -l documents.json
确定正确的文件记录数。使用stat -c %s documents.json
来确认大小。
详细的track结构说明见官网:track说明
执行track
命令:需要以离线方式运行,只进行基准压测,指明了使用的ES集群和track地址,对压测结果也进行了格式化的自定义输出。
esrally race --pipeline=benchmark-only --target-hosts=http://192.168.0.1:9200 --track-path=/opt/software/esrally/rally-tracks/tutorial --offline --report-file=/opt/software/esrally/report.csv --report-format=csv
执行结果
[esrally@~ tutorial]$ esrally race --pipeline=benchmark-only --target-hosts=http://192.168.0.1:9200 --track-path=/opt/software/esrally/rally-tracks/tutorial --offline --report-file=/opt/software/esrally/report.md --report-format=csv
____ ____
/ __ \\____ _/ / /_ __
/ /_/ / __ `/ / / / / /
/ _, _/ /_/ / / / /_/ /
/_/ |_|\\__,_/_/_/\\__, /
/____/
[INFO] Race id is [87a1c0b8-314c-4531-9b23-856c7c5e107c]
[INFO] Racing on track [tutorial] and car ['external'] with version [6.8.0].
Running delete-index [100% done]
Running create-index [100% done]
Running cluster-health [100% done]
Running bulk [100% done]
Running force-merge [100% done]
Running query-match-all [100% done]
------------------------------------------------------
_______ __ _____
/ ____(_)___ ____ _/ / / ___/_________ ________
/ /_ / / __ \\/ __ `/ / \\__ \\/ ___/ __ \\/ ___/ _ \\
/ __/ / / / / / /_/ / / ___/ / /__/ /_/ / / / __/
/_/ /_/_/ /_/\\__,_/_/ /____/\\___/\\____/_/ \\___/
------------------------------------------------------
结果报告
Metric | Task | Value | Unit | 含义 |
---|---|---|---|---|
Cumulative indexing time of primary shards | 0 | min | 主分片累计索引时间 | |
Min cumulative indexing time across primary shards | 0 | min | 跨分片累计索引最小时间 | |
Median cumulative indexing time across primary shards | 0 | min | 跨分片累计索引中位时间 | |
Max cumulative indexing time across primary shards | 0 | min | 跨分片累计索引最大时间 | |
Cumulative indexing throttle time of primary shards | 0 | min | 主分片累计节流索引时间 | |
Min cumulative indexing throttle time across primary shards | 0 | min | 跨分片累计节流最小索引时间 | |
Median cumulative indexing throttle time across primary shards | 0 | min | 跨分片累计节流中位索引时间 | |
Max cumulative indexing throttle time across primary shards | 0 | min | 跨分片累计节流最大索引时间 | |
Cumulative merge time of primary shards | 0 | min | 主分片累积合并时间 | |
Cumulative merge count of primary shards | 0 | 主分片累积合并次数 | ||
Min cumulative merge time across primary shards | 0 | min | 跨主分片累积最小合并时间 | |
Median cumulative merge time across primary shards | 0 | min | 跨主分片累积中位合并时间 | |
Max cumulative merge time across primary shards | 0 | min | 跨主分片累积最大合并时间 | |
Cumulative merge throttle time of primary shards | 0 | min | 主分片累计节流合并时间 | |
Min cumulative merge throttle time across primary shards | 0 | min | 主分片累计节流最小合并时间 | |
Median cumulative merge throttle time across primary shards | 0 | min | 主分片累计节流中位合并时间 | |
Max cumulative merge throttle time across primary shards | 0 | min | 主分片累计节流最大合并时间 | |
Cumulative refresh time of primary shards | 0 | min | 主分片累积refresh时间 | |
Cumulative refresh count of primary shards | 15 | 主分片累积refresh次数 | ||
Min cumulative refresh time across primary shards | 0 | min | 主分片累积最小refresh时间 | |
Median cumulative refresh time across primary shards | 0 | min | 主分片累积中位refresh时间 | |
Max cumulative refresh time across primary shards | 0 | min | 主分片累积最大refresh时间 | |
Cumulative flush time of primary shards | 0 | min | 主分片累积flush时间 | |
Cumulative flush count of primary shards | 0 | 主分片累积flush次数 | ||
Min cumulative flush time across primary shards | 0 | min | 主分片累积最小flush时间 | |
Median cumulative flush time across primary shards | 0 | min | 主分片累积中位flush时间 | |
Max cumulative flush time across primary shards | 0 | min | 主分片累积最大flush时间 | |
Total Young Gen GC time | 2.694 | s | Young GC总时间 | |
Total Young Gen GC count | 170 | Young GC总次数 | ||
Total Old Gen GC time | 0 | s | Old GC总时间 | |
Total Old Gen GC count | 0 | Old GC总次数 | ||
Store size | 1.07E-06 | GB | 存储大小 | |
Translog size | 5.12E-07 | GB | Translog大小 | |
Heap used for segments | 0 | MB | segments使用的堆内内存 | |
Heap used for doc values | 0 | MB | doc values使用的堆内内存 | |
Heap used for terms | 0 | MB | terms使用的堆内内存 | |
Heap used for norms | 0 | MB | norms使用的堆内内存 | |
Heap used for points | 0 | MB | points使用的堆内内存 | |
Heap used for stored fields | 0 | MB | stored fields使用的堆内内存 | |
Segment count | 0 | Segment数量 | ||
Total Ingest Pipeline count | 0 | |||
Total Ingest Pipeline time | 0 | s | ||
Total Ingest Pipeline failed | 0 | |||
error rate | bulk | 0 | % | 错误率 |
Min Throughput | query-match-all | 100 | ops/s | |
Mean Throughput | query-match-all | 100 | ops/s | |
Median Throughput | query-match-all | 100 | ops/s | |
Max Throughput | query-match-all | 100 | ops/s | |
50th percentile latency | query-match-all | 2.518748515 | ms | |
90th percentile latency | query-match-all | 3.393146186 | ms | |
99th percentile latency | query-match-all | 4.929880542 | ms | |
99.9th percentile latency | query-match-all | 6.498478545 | ms | |
100th percentile latency | query-match-all | 8.77224002 | ms | |
50th percentile service time | query-match-all | 1.522833598 | ms | |
90th percentile service time | query-match-all | 1.95039534 | ms | |
99th percentile service time | query-match-all | 3.240323039 | ms | |
99.9th percentile service time | query-match-all | 4.757250794 | ms | |
100th percentile service time | query-match-all | 6.071650889 | ms | |
error rate | query-match-all | 0 | % |
以上是关于ES实战Rally 离线使用实现自定义track压测的主要内容,如果未能解决你的问题,请参考以下文章
ES实战在Linux下 CentOS 7离线安装Rally2.7.0