时序数据库新手,从TimescaleDB for Grafana中选择数据速度慢,查询复杂

Posted

技术标签:

【中文标题】时序数据库新手,从TimescaleDB for Grafana中选择数据速度慢,查询复杂【英文标题】:New to time series databases, selecting data from TimescaleDB for Grafana is slow and the query is complex 【发布时间】:2021-08-12 16:06:06 【问题描述】:

我是 TimescaleDB 和 Grafana 的新手,我使用的查询很慢。我认为我写的查询可以改进很多,但我不确定如何。任何建议表示赞赏。

我要下表:

create table key_value_data (
    "time" timestamptz not null,
    "context" varchar null,
    "key" varchar[] not null,
    "path" varchar not null,
    "value" varchar not null
);

select create_hypertable('key_value_data', 'time');

然后我尝试为https://github.com/panodata/grafana-map-panel 选择数据。我试图解释我要在查询的 cmets 中实现的目标:

select
    * -- this removes all rows with a null value in one of the columns, see where statement below
from
(
select
    time_bucket_gapfill('5 seconds', "time") as "time", -- create time buckets of 5 seconds
    "name", -- the name of the vessel
    locf(last("lon", "time")) as "lon", -- last reported longitude in this bucket, if unknown in this bucket take the value of the previous bucket
    locf(last("lat", "time")) as "lat", -- last reported latitude in this bucket, if unknown in this bucket take the value of the previous bucket
    locf(last("sog", "time")) as "sog", -- last reported speed over ground in this bucket, if unknown in this bucket take the value of the previous bucket
    locf(last("navstate", "time")) as "navstate" -- last reported navigation state in this bucket, if unknown in this bucket take the value of the previous bucket
from
(
select
     "ais"."time",
    case when "names"."name" is null then "ais"."context" else "names"."name" end as "name",
    max(case when "ais"."path" = 'navigation.position.longitude' then "ais"."value"::numeric else null end) as "lon",
    max(case when "ais"."path" = 'navigation.position.latitude' then "ais"."value"::numeric else null end) as "lat",
    max(case when "ais"."path" = 'navigation.speedOverGround' then "ais"."value"::numeric * 3.6 else null end) as "sog",
    max(case when "ais"."path" = 'navigation.state' then "ais"."value"::varchar else null end) as "navstate"
from
(
select
    "time",
    "context",
    "path",
    "value"
from
    "key_value_data"
where
  $__timeFilter("time") and
    "path" in ('navigation.position.longitude', 'navigation.position.latitude', 'navigation.speedOverGround', 'navigation.state')
order by
    1, 2, 3
) as "ais" -- this is a subquery to pivot the data, I cannot get the crosstab function working because I don't know how to insert the grafana $__timeFilter in the query text
inner join
(
select
    "context",
    last("value", "time") as "name"
from
    "key_value_data" as "names"
where
  $__timeFilter("time") and
    "path" = 'name'
group by 
    1
) as "names" -- I made a separate query to retrieve the name of the vessel because this value is not injected in the table every x seconds but less frequent
on "ais"."context" = "names"."context"
group by
    1, 2
) as "bucket"
where
  $__timeFilter("time")
group by
    1, 2
) as "result"
where
    "lon" is not null and -- remove all rows with a null value in one of these columns
    "lat" is not null and
    "sog" is not null and
    "navstate" is not null

我最终得到的查询既复杂又缓慢,我认为应该有更简单的方法来做到这一点。

Successfully run. Total query runtime: 465 msec.
106233 rows affected.

问题:

    键值方法是在 key_value_data 表中存储数据的好方法吗?我现在不知道事先有哪些可用的钥匙,这取决于船上可用的传感器。 是否有更简单的方法来透视使用 Grafana 的 $__timeFilter 函数的数据? 是否需要透视数据,Grafana 是否可以在不透视的情况下处理键值数据?

【问题讨论】:

465 毫秒不是很快,但也没有那么慢。而您正在处理 106233 条记录,这可能是硬件问题。你能告诉我们这个查询的结果 van EXPLAIN(ANALYZE, BUFFERS, VERBOSE) 吗? 查询计划见gist.github.com/munnik/89a160a65454dd71f7e373459cf1a89b 【参考方案1】:

一些事情:

虽然我会做一些修改,但键值方法可能是一种不错的存储方式:
    我肯定会将名称内容分解到一个单独的表中,并分别过滤这些消息。在您的查询中,这些内容的处理方式不同,您绝对应该在摄取时单独处理,而不是在查询时处理。 您可能应该考虑建立索引,可能是(路径、时间)上的一个,随着数据的增长,这会显示得更多,但看起来您的大部分时间都花在扫描表上,现在它只有一个索引准时,因此它必须手动过滤掉所有其他键。您还可以考虑是否需要一个单独的表来存储您的密钥,以便您可以将路径存储为整数或类似的东西。 将 varchar 或文本转换为数字可能是一个糟糕的选择,存储成本很大,开销很大等。我建议您使用多个列来存储您想要存储的实际类型。如果可以,请避免使用数字,除非出于某种原因确实需要全精度,否则请使用浮点数来提高性能。 (此外,如果您启用压缩,浮点/双精度压缩将比数字压缩要好得多)。
要进行“透视”查询,我建议在聚合上使用FILTER 子句,请参阅:https://www.postgresql.org/docs/current/sql-expressions.html#SYNTAX-AGGREGATES。那至少应该消除那些案例陈述。如果您愿意,您也可以只使用 HAVING 子句以避免另一个子选择。

查询可能类似于:

select
    time_bucket_gapfill('5 seconds', "time") as "time", -- create time buckets of 5 seconds
    "context",
    locf(last("value", "time") FILTER (WHERE "path" = 'navigation.position.longitude')) as "lon", -- last reported longitude in this bucket, if unknown in this bucket take the value of the previous bucket
    locf(last("value", "time") FILTER (WHERE "path" = 'navigation.position.latitude')) as "lat", -- last reported latitude in this bucket, if unknown in this bucket take the value of the previous bucket
    locf(last("value", "time") FILTER (WHERE "path" = 'navigation.speedOverGround')) as "sog", -- last reported speed over ground in this bucket, if unknown in this bucket take the value of the previous bucket
    locf(last("value", "time") FILTER (WHERE "path" = 'navigation.state'))) as "navstate" -- last reported navigation state in this bucket, if unknown in this bucket take the value of the previous bucket

from
    "key_value_data"
where
  $__timeFilter("time") and
    "path" in ('navigation.position.longitude', 'navigation.position.latitude', 'navigation.speedOverGround', 'navigation.state')
GROUP BY 
    1, 2
order by
    1, 2, 3
)
我一般会避免使用 varchar,文本在 PG 中更好,varchar 会增加额外的开销(尽管这主要是 varchar(n) 的情况,所以这里可能不是一个大问题,我只是更喜欢文本)。见:https://wiki.postgresql.org/wiki/Don't_Do_This

【讨论】:

感谢您的精彩回复!我会试试这些东西并报告。

以上是关于时序数据库新手,从TimescaleDB for Grafana中选择数据速度慢,查询复杂的主要内容,如果未能解决你的问题,请参考以下文章

timescaledb 时序库备份还原 遇到的问题与解决

timescaledb 时序库备份还原 遇到的问题与解决

基于PostgreSQL的时序数据库TimescaleDB(下)

快速入门:Java 连接使用 时序数据库 TimescaleDB

快速入门:Java 连接使用 时序数据库 TimescaleDB

Centos7 安装 PostgreSql 14 数据库 和 timescaledb 时序库