ES实战Rally 离线使用实现自定义track压测

Posted 顧棟

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了ES实战Rally 离线使用实现自定义track压测相关的知识,希望对你有一定的参考价值。

Rally 离线使用实现自定义track压测

文章目录

离线安装

离线安装Rally

离线安装Rally 安装之后,默认配置不动

在另外的机器上单独部署的ES集群,或者说是,自主修改的ES集群,不使用官方的发布包。

术语表

track是对一个或多个具有特定文档语料库的基准测试场景的描述。例如,它定义了所涉及的索引、数据文件以及所调用的操作。用esrally list tracks列出可用的轨迹。尽管Rally开箱即有一些轨道,但你通常应该根据自己的数据创建自己的轨道。

基准测试场景的基础组成数据,如对集群读写的操作步骤和索引使用的数据。

challenge
一个挑战描述了一个基准情景,例如,用4个客户端以最大吞吐量索引文档,同时从另外两个客户端发出术语和短语查询,每个客户端的速度限制为每秒10次查询。它总是在一个轨道的背景下被指定。通过使用esrally list tracks列出相应的轨道来查看可用的挑战。

基准测试场景的不同测试的维度数据,如并发度的不同

car
car是Elasticsearch集群的特定配置,例如开箱即用的配置、具有特定堆大小的配置或自定义日志配置。用esrally list cars列出可用的车。

基准测试场景的ES集群的配置数据

telemetry
遥测在Rally中用于收集关于车的指标,例如CPU使用率或索引大小。

基准测试场景中的测试效果度量指标数据

race
一场比赛是对Rally二进制的一次调用。另一个名字是一个 “基准试验”。在一场比赛中,Rally用给定的赛车在赛道上运行一次挑战。

一次基准测试的执行。

tournament
一个锦标赛是两个比赛的比较。你可以使用Rally的锦标赛模式来进行。

对比基准测试的不同执行


自定义track-tutorial

创建自定义的track目录/opt/software/esrally/rally-tracks/tutorial

自定义的track目录中需要三个文件:

  • documents.json :结构化的索引的数据

    数据来源于官网测试数据 allCountries.zip (around 300MB),解压后会有一个文件 allCountries.txt。然后通过Python脚本将数据转测json。

    python脚本

    import json
    
    cols = (("geonameid", "int", True),
            ("name", "string", True),
            ("asciiname", "string", False),
            ("alternatenames", "string", False),
            ("latitude", "double", True),
            ("longitude", "double", True),
            ("feature_class", "string", False),
            ("feature_code", "string", False),
            ("country_code", "string", True),
            ("cc2", "string", False),
            ("admin1_code", "string", False),
            ("admin2_code", "string", False),
            ("admin3_code", "string", False),
            ("admin4_code", "string", False),
            ("population", "long", True),
            ("elevation", "int", False),
            ("dem", "string", False),
            ("timezone", "string", False))
    
    
    def main():
        with open("allCountries.txt", "rt", encoding="UTF-8") as f:
            for line in f:
                tup = line.strip().split("\\t")
                record = 
                for i in range(len(cols)):
                    name, type, include = cols[i]
                    if tup[i] != "" and include:
                        if type in ("int", "long"):
                            record[name] = int(tup[i])
                        elif type == "double":
                            record[name] = float(tup[i])
                        elif type == "string":
                            record[name] = tup[i]
                print(json.dumps(record, ensure_ascii=False))
    
    
    if __name__ == "__main__":
        main()
    

    脚本调用:python3 toJSON.py > documents.json

    "geonameid": 2986043, "name": "Pic de Font Blanca", "latitude": 42.64991, "longitude": 1.53335, "country_code": "AD", "population": 0
    "geonameid": 2994701, "name": "Roc Mélé", "latitude": 42.58765, "longitude": 1.74028, "country_code": "AD", "population": 0
    "geonameid": 3007683, "name": "Pic des Langounelles", "latitude": 42.61203, "longitude": 1.47364, "country_code": "AD", "population": 0
    "geonameid": 3017832, "name": "Pic de les Abelletes", "latitude": 42.52535, "longitude": 1.73343, "country_code": "AD", "population": 0
    "geonameid": 3017833, "name": "Estany de les Abelletes", "latitude": 42.52915, "longitude": 1.73362, "country_code": "AD", "population": 0
    "geonameid": 3023203, "name": "Port Vieux de la Coume d’Ose", "latitude": 42.62568, "longitude": 1.61823, "country_code": "AD", "population": 0
    "geonameid": 3029315, "name": "Port de la Cabanette", "latitude": 42.6, "longitude": 1.73333, "country_code": "AD", "population": 0
    "geonameid": 3034945, "name": "Port Dret", "latitude": 42.60172, "longitude": 1.45562, "country_code": "AD", "population": 0
    "geonameid": 3038814, "name": "Costa de Xurius", "latitude": 42.50692, "longitude": 1.47569, "country_code": "AD", "population": 0
    "geonameid": 3038815, "name": "Font de la Xona", "latitude": 42.55003, "longitude": 1.44986, "country_code": "AD", "population": 0
    "geonameid": 3038816, "name": "Xixerella", "latitude": 42.55327, "longitude": 1.48736, "country_code": "AD", "population": 0
    "geonameid": 3038818, "name": "Riu Xic", "latitude": 42.57165, "longitude": 1.67554, "country_code": "AD", "population": 0
    "geonameid": 3038819, "name": "Pas del Xic", "latitude": 42.49766, "longitude": 1.57597, "country_code": "AD", "population": 0
    "geonameid": 3038820, "name": "Roc del Xeig", "latitude": 42.56068, "longitude": 1.4898, "country_code": "AD", "population": 0
    
  • index.json:索引的结构mappings和settings

    
      "settings": 
        "index.number_of_replicas": 0
      ,
      "mappings": 
            "_doc":
                "dynamic": "strict",
                "properties": 
                  "geonameid": 
                    "type": "long"
                  ,
                  "name": 
                    "type": "text"
                  ,
                  "latitude": 
                    "type": "double"
                  ,
                  "longitude": 
                    "type": "double"
                  ,
                  "country_code": 
                    "type": "text"
                  ,
                  "population": 
                    "type": "long"
                  
                
            
      
    
    
  • track.json:定义了压测的索引说明和压测场景调度

    
      "version": 2,
      "description": "Tutorial benchmark for Rally",
      "indices": [
        
          "name": "geonames",
          "body": "index.json"
        
      ],
      "corpora": [
        
          "name": "rally-tutorial",
          "documents": [
            
              "source-file": "documents.json",
              "document-count": 11658903,
              "uncompressed-bytes": 1544799789
            
          ]
        
      ],
      "schedule": [
        
          "operation": 
            "operation-type": "delete-index"
          
        ,
        
          "operation": 
            "operation-type": "create-index"
          
        ,
        
          "operation": 
            "operation-type": "cluster-health",
            "request-params": 
              "wait_for_status": "green"
            ,
            "retry-until-success": true
          
        ,
        
          "operation": 
            "operation-type": "bulk",
            "bulk-size": 5000
          ,
          "warmup-time-period": 120,
          "clients": 8
        ,
        
          "operation": 
            "operation-type": "force-merge"
          
        ,
        
          "operation": 
            "name": "query-match-all",
            "operation-type": "search",
            "body": 
              "query": 
                "match_all": 
              
            
          ,
          "clients": 8,
          "warmup-iterations": 1000,
          "iterations": 1000,
          "target-throughput": 100
        
      ]
    
    

documents属性下的数字需要用来验证完整性和提供进度报告。source-file数据源文件,document-count文档中记录的数量,uncompressed-bytes未压缩的文档总大小。官网的数据文档可能会发生变化,可以通过命令来确认实际的数据情况,用wc -l documents.json确定正确的文件记录数。使用stat -c %s documents.json来确认大小。

详细的track结构说明见官网:track说明

执行track

命令:需要以离线方式运行,只进行基准压测,指明了使用的ES集群和track地址,对压测结果也进行了格式化的自定义输出。

esrally race --pipeline=benchmark-only --target-hosts=http://192.168.0.1:9200 --track-path=/opt/software/esrally/rally-tracks/tutorial --offline --report-file=/opt/software/esrally/report.csv --report-format=csv

执行结果

[esrally@~ tutorial]$ esrally race --pipeline=benchmark-only --target-hosts=http://192.168.0.1:9200 --track-path=/opt/software/esrally/rally-tracks/tutorial --offline --report-file=/opt/software/esrally/report.md --report-format=csv

    ____        ____
   / __ \\____ _/ / /_  __
  / /_/ / __ `/ / / / / /
 / _, _/ /_/ / / / /_/ /
/_/ |_|\\__,_/_/_/\\__, /
                /____/

[INFO] Race id is [87a1c0b8-314c-4531-9b23-856c7c5e107c]
[INFO] Racing on track [tutorial] and car ['external'] with version [6.8.0].

Running delete-index                                                           [100% done]
Running create-index                                                           [100% done]
Running cluster-health                                                         [100% done]
Running bulk                                                                   [100% done]
Running force-merge                                                            [100% done]
Running query-match-all                                                        [100% done]

------------------------------------------------------
    _______             __   _____
   / ____(_)___  ____ _/ /  / ___/_________  ________
  / /_  / / __ \\/ __ `/ /   \\__ \\/ ___/ __ \\/ ___/ _ \\
 / __/ / / / / / /_/ / /   ___/ / /__/ /_/ / /  /  __/
/_/   /_/_/ /_/\\__,_/_/   /____/\\___/\\____/_/   \\___/
------------------------------------------------------

结果报告

MetricTaskValueUnit含义
Cumulative indexing time of primary shards0min主分片累计索引时间
Min cumulative indexing time across primary shards0min跨分片累计索引最小时间
Median cumulative indexing time across primary shards0min跨分片累计索引中位时间
Max cumulative indexing time across primary shards0min跨分片累计索引最大时间
Cumulative indexing throttle time of primary shards0min主分片累计节流索引时间
Min cumulative indexing throttle time across primary shards0min跨分片累计节流最小索引时间
Median cumulative indexing throttle time across primary shards0min跨分片累计节流中位索引时间
Max cumulative indexing throttle time across primary shards0min跨分片累计节流最大索引时间
Cumulative merge time of primary shards0min主分片累积合并时间
Cumulative merge count of primary shards0主分片累积合并次数
Min cumulative merge time across primary shards0min跨主分片累积最小合并时间
Median cumulative merge time across primary shards0min跨主分片累积中位合并时间
Max cumulative merge time across primary shards0min跨主分片累积最大合并时间
Cumulative merge throttle time of primary shards0min主分片累计节流合并时间
Min cumulative merge throttle time across primary shards0min主分片累计节流最小合并时间
Median cumulative merge throttle time across primary shards0min主分片累计节流中位合并时间
Max cumulative merge throttle time across primary shards0min主分片累计节流最大合并时间
Cumulative refresh time of primary shards0min主分片累积refresh时间
Cumulative refresh count of primary shards15主分片累积refresh次数
Min cumulative refresh time across primary shards0min主分片累积最小refresh时间
Median cumulative refresh time across primary shards0min主分片累积中位refresh时间
Max cumulative refresh time across primary shards0min主分片累积最大refresh时间
Cumulative flush time of primary shards0min主分片累积flush时间
Cumulative flush count of primary shards0主分片累积flush次数
Min cumulative flush time across primary shards0min主分片累积最小flush时间
Median cumulative flush time across primary shards0min主分片累积中位flush时间
Max cumulative flush time across primary shards0min主分片累积最大flush时间
Total Young Gen GC time2.694sYoung GC总时间
Total Young Gen GC count170Young GC总次数
Total Old Gen GC time0sOld GC总时间
Total Old Gen GC count0Old GC总次数
Store size1.07E-06GB存储大小
Translog size5.12E-07GBTranslog大小
Heap used for segments0MBsegments使用的堆内内存
Heap used for doc values0MBdoc values使用的堆内内存
Heap used for terms0MBterms使用的堆内内存
Heap used for norms0MBnorms使用的堆内内存
Heap used for points0MBpoints使用的堆内内存
Heap used for stored fields0MBstored fields使用的堆内内存
Segment count0Segment数量
Total Ingest Pipeline count0
Total Ingest Pipeline time0s
Total Ingest Pipeline failed0
error ratebulk0%错误率
Min Throughputquery-match-all100ops/s
Mean Throughputquery-match-all100ops/s
Median Throughputquery-match-all100ops/s
Max Throughputquery-match-all100ops/s
50th percentile latencyquery-match-all2.518748515ms
90th percentile latencyquery-match-all3.393146186ms
99th percentile latencyquery-match-all4.929880542ms
99.9th percentile latencyquery-match-all6.498478545ms
100th percentile latencyquery-match-all8.77224002ms
50th percentile service timequery-match-all1.522833598ms
90th percentile service timequery-match-all1.95039534ms
99th percentile service timequery-match-all3.240323039ms
99.9th percentile service timequery-match-all4.757250794ms
100th percentile service timequery-match-all6.071650889ms
error ratequery-match-all0%

以上是关于ES实战Rally 离线使用实现自定义track压测的主要内容,如果未能解决你的问题,请参考以下文章

ES实战在Linux下 CentOS 7离线安装Rally2.7.0

ES实战在Linux下 CentOS 7离线安装Rally2.7.0

ES实战ES 插件包离线安装(本地文件)

esrally 如何进行简单的自定义性能测试?

esrally 如何进行简单的自定义性能测试?

ES实战索引的路由