响应大的influxdb查询太慢

Posted

技术标签:

【中文标题】响应大的influxdb查询太慢【英文标题】:influxdb query with large response too slow 【发布时间】:2020-01-03 07:35:59 【问题描述】:

我们的查询需要 20 秒,我们需要大幅减少这一时间。我们通过 python 数据框客户端调用它,但我通过 CLI 客户端重现了相同的查询和 20 秒的响应时间:

influx --host 10.0.5.183 --precision RFC3339 -execute "select * from turbine_ops.permanent.turbine_interval where ((turbine_id = 'NKWF-T15' or turbine_id = 'NKWF-T41' or turbine_id = 'NKWF-T23' or turbine_id = 'NKWF-T19' or turbine_id = 'NKWF-T51' or turbine_id = 'NKWF-T14' or turbine_id = 'NKWF-T42' or turbine_id = 'NKWF-T26' or turbine_id = 'NKWF-T39' or turbine_id = 'NKWF-T49' or turbine_id = 'NKWF-T38') and time >= '2019-05-01')">/dev/null

Influx 在 r5.large EC2 实例上运行,EBS 为通用 SSD (gp2) 卷,CLI 位于同一子网中的 EC2 上。该查询返回 747120 行,每行有 1 个标签 (turbine_id) 和 5 个字段(所有十进制值)。这看起来正常吗?

通过 influx 主机上的 htop,我发现 RAM 使用率没有显着变化,在查询开始时持续约 1 秒的短暂 CPU 峰值,然后没有后续 CPU 活动。

分片持续时间设置为 1 年。

show series exact cardinality on turbine_ops
name: turbine_interval
count
-----
11

我尝试将 influxdb 主机缩放到 r5.8xlarge 并且查询时间根本没有改变。

explain select * from turbine_ops.permanent.turbine_interval where ((turbine_ = 'NKWF-T15' or turbine_id = 'NKWF-T41' or turbine_id = 'NKWF-T23' or turbine_id = 'NKWF-T19' or turbine_id = 'NKWF-T51' or turbine_id = 'NKWF-T14' or turbine_id = 'NKWF-T42' or turbine_id = 'NKWF-T26' or turbine_id = 'NKWF-T39' or turbine_id = 'NKWF-T49' or turbine_id = 'NKWF-T38') and time >= '2019-05-01')

    QUERY PLAN
    EXPRESSION: 
    AUXILIARY FIELDS: active_power::float, “duration”::integer, rotor_rpm::float, turbine_id::tag, wind_speed::float, yaw_direction::float
    NUMBER OF SHARDS: 1
    NUMBER OF SERIES: 10
    CACHED VALUES: 0
    NUMBER OF FILES: 150
    NUMBER OF BLOCKS: 3515
    SIZE OF BLOCKS: 12403470

explain analyze select * from turbine_ops.permanent.turbine_interval where ((turbine_ = 'NKWF-T15' or turbine_id = 'NKWF-T41' or turbine_id = 'NKWF-T23' or turbine_id = 'NKWF-T19' or turbine_id = 'NKWF-T51' or turbine_id = 'NKWF-T14' or turbine_id = 'NKWF-T42' or turbine_id = 'NKWF-T26' or turbine_id = 'NKWF-T39' or turbine_id = 'NKWF-T49' or turbine_id = 'NKWF-T38') and time >= '2019-05-01')

EXPLAIN ANALYZE
.
└── select
├── execution_time: 1.442047426s
├── planning_time: 2.105094ms
├── total_time: 1.44415252s
└── build_cursor
├── labels
│ └── statement: SELECT active_power::float, “duration”::integer, rotor_rpm::float, turbine_id::tag, wind_speed::float, yaw_direction::float FROM turbine_ops.permanent.turbine_interval WHERE turbine_ = ‘NKWF-T15’ OR turbine_id::tag = ‘NKWF-T41’ OR turbine_id::tag = ‘NKWF-T23’ OR turbine_id::tag = ‘NKWF-T19’ OR turbine_id::tag = ‘NKWF-T51’ OR turbine_id::tag = ‘NKWF-T14’ OR turbine_id::tag = ‘NKWF-T42’ OR turbine_id::tag = ‘NKWF-T26’ OR turbine_id::tag = ‘NKWF-T39’ OR turbine_id::tag = ‘NKWF-T49’ OR turbine_id::tag = ‘NKWF-T38’
└── iterator_scanner
├── labels
│ └── auxiliary_fields: active_power::float, “duration”::integer, rotor_rpm::float, turbine_id::tag, wind_speed::float, yaw_direction::float
└── create_iterator
├── labels
│ ├── cond: turbine_ = ‘NKWF-T15’ OR turbine_id::tag = ‘NKWF-T41’ OR turbine_id::tag = ‘NKWF-T23’ OR turbine_id::tag = ‘NKWF-T19’ OR turbine_id::tag = ‘NKWF-T51’ OR turbine_id::tag = ‘NKWF-T14’ OR turbine_id::tag = ‘NKWF-T42’ OR turbine_id::tag = ‘NKWF-T26’ OR turbine_id::tag = ‘NKWF-T39’ OR turbine_id::tag = ‘NKWF-T49’ OR turbine_id::tag = ‘NKWF-T38’
│ ├── measurement: turbine_interval
│ └── shard_id: 1584
├── cursors_ref: 0
├── cursors_aux: 50
├── cursors_cond: 0
├── float_blocks_decoded: 2812
├── float_blocks_size_bytes: 12382380
├── integer_blocks_decoded: 703
├── integer_blocks_size_bytes: 21090
├── unsigned_blocks_decoded: 0
├── unsigned_blocks_size_bytes: 0
├── string_blocks_decoded: 0
├── string_blocks_size_bytes: 0
├── boolean_blocks_decoded: 0
├── boolean_blocks_size_bytes: 0
└── planning_time: 1.624627ms

请告诉我我们可以进行的任何优化。

【问题讨论】:

【参考方案1】:

当我直接 curl HTTP API 并得到大约 3 秒的响应时,我的怀疑得到了证实,influx 本身并不是罪魁祸首。我不确定为什么 CLI 或 python DataFrameClient 会增加如此多的开销,但我使用这个在 3.78 秒内得到了 Pandas 数据帧:

import urllib
import pandas as pd
from io import BytesIO

data = 
data['db']='turbine_ops'
data['precision']='s'
data['q']="select * from turbine_ops.permanent.turbine_interval where ((turbine_id = 'NKWF-T15' or turbine_id = 'NKWF-T41' or turbine_id = 'NKWF-T23' or turbine_id = 'NKWF-T19' or turbine_id = 'NKWF-T51' or turbine_id = 'NKWF-T14' or turbine_id = 'NKWF-T42' or turbine_id = 'NKWF-T26' or turbine_id = 'NKWF-T39' or turbine_id = 'NKWF-T49' or turbine_id = 'NKWF-T38') and time >= '2019-05-01')"
url_values=urllib.parse.urlencode(data)
url="http://10.0.5.183:8086/query?" + url_values
request = urllib.request.Request(url, headers='Accept':'application/csv')
response = urllib.request.urlopen(request)
response_bytestr = response.read()
df = pd.read_csv(BytesIO(response_bytestr), sep=",")

这是一个好的开始,越快越好,所以请提交其他解决方案。

【讨论】:

以上是关于响应大的influxdb查询太慢的主要内容,如果未能解决你的问题,请参考以下文章

InfluxDB 的查询构建器?

influxdb filed最大存多少数据

InfluxDB:来自多个系列的查询

使用通量的 influxdb 查询失败

跨 InfluxDb 指标查询?

InfluxDB - 查询自时间序列中最后一个数据点以来的毫秒数