时序数据库新手,从TimescaleDB for Grafana中选择数据速度慢,查询复杂
Posted
技术标签:
【中文标题】时序数据库新手,从TimescaleDB for Grafana中选择数据速度慢,查询复杂【英文标题】:New to time series databases, selecting data from TimescaleDB for Grafana is slow and the query is complex 【发布时间】:2021-08-12 16:06:06 【问题描述】:我是 TimescaleDB 和 Grafana 的新手,我使用的查询很慢。我认为我写的查询可以改进很多,但我不确定如何。任何建议表示赞赏。
我要下表:
create table key_value_data (
"time" timestamptz not null,
"context" varchar null,
"key" varchar[] not null,
"path" varchar not null,
"value" varchar not null
);
select create_hypertable('key_value_data', 'time');
然后我尝试为https://github.com/panodata/grafana-map-panel 选择数据。我试图解释我要在查询的 cmets 中实现的目标:
select
* -- this removes all rows with a null value in one of the columns, see where statement below
from
(
select
time_bucket_gapfill('5 seconds', "time") as "time", -- create time buckets of 5 seconds
"name", -- the name of the vessel
locf(last("lon", "time")) as "lon", -- last reported longitude in this bucket, if unknown in this bucket take the value of the previous bucket
locf(last("lat", "time")) as "lat", -- last reported latitude in this bucket, if unknown in this bucket take the value of the previous bucket
locf(last("sog", "time")) as "sog", -- last reported speed over ground in this bucket, if unknown in this bucket take the value of the previous bucket
locf(last("navstate", "time")) as "navstate" -- last reported navigation state in this bucket, if unknown in this bucket take the value of the previous bucket
from
(
select
"ais"."time",
case when "names"."name" is null then "ais"."context" else "names"."name" end as "name",
max(case when "ais"."path" = 'navigation.position.longitude' then "ais"."value"::numeric else null end) as "lon",
max(case when "ais"."path" = 'navigation.position.latitude' then "ais"."value"::numeric else null end) as "lat",
max(case when "ais"."path" = 'navigation.speedOverGround' then "ais"."value"::numeric * 3.6 else null end) as "sog",
max(case when "ais"."path" = 'navigation.state' then "ais"."value"::varchar else null end) as "navstate"
from
(
select
"time",
"context",
"path",
"value"
from
"key_value_data"
where
$__timeFilter("time") and
"path" in ('navigation.position.longitude', 'navigation.position.latitude', 'navigation.speedOverGround', 'navigation.state')
order by
1, 2, 3
) as "ais" -- this is a subquery to pivot the data, I cannot get the crosstab function working because I don't know how to insert the grafana $__timeFilter in the query text
inner join
(
select
"context",
last("value", "time") as "name"
from
"key_value_data" as "names"
where
$__timeFilter("time") and
"path" = 'name'
group by
1
) as "names" -- I made a separate query to retrieve the name of the vessel because this value is not injected in the table every x seconds but less frequent
on "ais"."context" = "names"."context"
group by
1, 2
) as "bucket"
where
$__timeFilter("time")
group by
1, 2
) as "result"
where
"lon" is not null and -- remove all rows with a null value in one of these columns
"lat" is not null and
"sog" is not null and
"navstate" is not null
我最终得到的查询既复杂又缓慢,我认为应该有更简单的方法来做到这一点。
Successfully run. Total query runtime: 465 msec.
106233 rows affected.
问题:
-
键值方法是在 key_value_data 表中存储数据的好方法吗?我现在不知道事先有哪些可用的钥匙,这取决于船上可用的传感器。
是否有更简单的方法来透视使用 Grafana 的
$__timeFilter
函数的数据?
是否需要透视数据,Grafana 是否可以在不透视的情况下处理键值数据?
【问题讨论】:
465 毫秒不是很快,但也没有那么慢。而您正在处理 106233 条记录,这可能是硬件问题。你能告诉我们这个查询的结果 van EXPLAIN(ANALYZE, BUFFERS, VERBOSE) 吗? 查询计划见gist.github.com/munnik/89a160a65454dd71f7e373459cf1a89b 【参考方案1】:一些事情:
虽然我会做一些修改,但键值方法可能是一种不错的存储方式:-
我肯定会将名称内容分解到一个单独的表中,并分别过滤这些消息。在您的查询中,这些内容的处理方式不同,您绝对应该在摄取时单独处理,而不是在查询时处理。
您可能应该考虑建立索引,可能是(路径、时间)上的一个,随着数据的增长,这会显示得更多,但看起来您的大部分时间都花在扫描表上,现在它只有一个索引准时,因此它必须手动过滤掉所有其他键。您还可以考虑是否需要一个单独的表来存储您的密钥,以便您可以将路径存储为整数或类似的东西。
将 varchar 或文本转换为数字可能是一个糟糕的选择,存储成本很大,开销很大等。我建议您使用多个列来存储您想要存储的实际类型。如果可以,请避免使用数字,除非出于某种原因确实需要全精度,否则请使用浮点数来提高性能。 (此外,如果您启用压缩,浮点/双精度压缩将比数字压缩要好得多)。
FILTER
子句,请参阅:https://www.postgresql.org/docs/current/sql-expressions.html#SYNTAX-AGGREGATES。那至少应该消除那些案例陈述。如果您愿意,您也可以只使用 HAVING
子句以避免另一个子选择。
查询可能类似于:
select
time_bucket_gapfill('5 seconds', "time") as "time", -- create time buckets of 5 seconds
"context",
locf(last("value", "time") FILTER (WHERE "path" = 'navigation.position.longitude')) as "lon", -- last reported longitude in this bucket, if unknown in this bucket take the value of the previous bucket
locf(last("value", "time") FILTER (WHERE "path" = 'navigation.position.latitude')) as "lat", -- last reported latitude in this bucket, if unknown in this bucket take the value of the previous bucket
locf(last("value", "time") FILTER (WHERE "path" = 'navigation.speedOverGround')) as "sog", -- last reported speed over ground in this bucket, if unknown in this bucket take the value of the previous bucket
locf(last("value", "time") FILTER (WHERE "path" = 'navigation.state'))) as "navstate" -- last reported navigation state in this bucket, if unknown in this bucket take the value of the previous bucket
from
"key_value_data"
where
$__timeFilter("time") and
"path" in ('navigation.position.longitude', 'navigation.position.latitude', 'navigation.speedOverGround', 'navigation.state')
GROUP BY
1, 2
order by
1, 2, 3
)
我一般会避免使用 varchar,文本在 PG 中更好,varchar 会增加额外的开销(尽管这主要是 varchar(n) 的情况,所以这里可能不是一个大问题,我只是更喜欢文本)。见:https://wiki.postgresql.org/wiki/Don't_Do_This
【讨论】:
感谢您的精彩回复!我会试试这些东西并报告。以上是关于时序数据库新手,从TimescaleDB for Grafana中选择数据速度慢,查询复杂的主要内容,如果未能解决你的问题,请参考以下文章
基于PostgreSQL的时序数据库TimescaleDB(下)
快速入门:Java 连接使用 时序数据库 TimescaleDB