时序数据库新手，从TimescaleDB for Grafana中选择数据速度慢，查询复杂

Posted 2023-02-16

技术标签:

【中文标题】时序数据库新手，从TimescaleDB for Grafana中选择数据速度慢，查询复杂【英文标题】：New to time series databases, selecting data from TimescaleDB for Grafana is slow and the query is complex 【发布时间】：2021-08-12 16:06:06 【问题描述】：

我是 TimescaleDB 和 Grafana 的新手，我使用的查询很慢。我认为我写的查询可以改进很多，但我不确定如何。任何建议表示赞赏。

我要下表：

create table key_value_data (
    "time" timestamptz not null,
    "context" varchar null,
    "key" varchar[] not null,
    "path" varchar not null,
    "value" varchar not null
);

select create_hypertable('key_value_data', 'time');

然后我尝试为https://github.com/panodata/grafana-map-panel 选择数据。我试图解释我要在查询的 cmets 中实现的目标：

select
    * -- this removes all rows with a null value in one of the columns, see where statement below
from
(
select
    time_bucket_gapfill('5 seconds', "time") as "time", -- create time buckets of 5 seconds
    "name", -- the name of the vessel
    locf(last("lon", "time")) as "lon", -- last reported longitude in this bucket, if unknown in this bucket take the value of the previous bucket
    locf(last("lat", "time")) as "lat", -- last reported latitude in this bucket, if unknown in this bucket take the value of the previous bucket
    locf(last("sog", "time")) as "sog", -- last reported speed over ground in this bucket, if unknown in this bucket take the value of the previous bucket
    locf(last("navstate", "time")) as "navstate" -- last reported navigation state in this bucket, if unknown in this bucket take the value of the previous bucket
from
(
select
     "ais"."time",
    case when "names"."name" is null then "ais"."context" else "names"."name" end as "name",
    max(case when "ais"."path" = 'navigation.position.longitude' then "ais"."value"::numeric else null end) as "lon",
    max(case when "ais"."path" = 'navigation.position.latitude' then "ais"."value"::numeric else null end) as "lat",
    max(case when "ais"."path" = 'navigation.speedOverGround' then "ais"."value"::numeric * 3.6 else null end) as "sog",
    max(case when "ais"."path" = 'navigation.state' then "ais"."value"::varchar else null end) as "navstate"
from
(
select
    "time",
    "context",
    "path",
    "value"
from
    "key_value_data"
where
  $__timeFilter("time") and
    "path" in ('navigation.position.longitude', 'navigation.position.latitude', 'navigation.speedOverGround', 'navigation.state')
order by
    1, 2, 3
) as "ais" -- this is a subquery to pivot the data, I cannot get the crosstab function working because I don't know how to insert the grafana $__timeFilter in the query text
inner join
(
select
    "context",
    last("value", "time") as "name"
from
    "key_value_data" as "names"
where
  $__timeFilter("time") and
    "path" = 'name'
group by 
    1
) as "names" -- I made a separate query to retrieve the name of the vessel because this value is not injected in the table every x seconds but less frequent
on "ais"."context" = "names"."context"
group by
    1, 2
) as "bucket"
where
  $__timeFilter("time")
group by
    1, 2
) as "result"
where
    "lon" is not null and -- remove all rows with a null value in one of these columns
    "lat" is not null and
    "sog" is not null and
    "navstate" is not null

我最终得到的查询既复杂又缓慢，我认为应该有更简单的方法来做到这一点。

Successfully run. Total query runtime: 465 msec.
106233 rows affected.

问题：

$__timeFilter

【问题讨论】：

465 毫秒不是很快，但也没有那么慢。而您正在处理 106233 条记录，这可能是硬件问题。你能告诉我们这个查询的结果 van EXPLAIN(ANALYZE, BUFFERS, VERBOSE) 吗？查询计划见gist.github.com/munnik/89a160a65454dd71f7e373459cf1a89b 【参考方案1】：

一些事情：

虽然我会做一些修改，但键值方法可能是一种不错的存储方式：

我肯定会将名称内容分解到一个单独的表中，并分别过滤这些消息。在您的查询中，这些内容的处理方式不同，您绝对应该在摄取时单独处理，而不是在查询时处理。您可能应该考虑建立索引，可能是（路径、时间）上的一个，随着数据的增长，这会显示得更多，但看起来您的大部分时间都花在扫描表上，现在它只有一个索引准时，因此它必须手动过滤掉所有其他键。您还可以考虑是否需要一个单独的表来存储您的密钥，以便您可以将路径存储为整数或类似的东西。将 varchar 或文本转换为数字可能是一个糟糕的选择，存储成本很大，开销很大等。我建议您使用多个列来存储您想要存储的实际类型。如果可以，请避免使用数字，除非出于某种原因确实需要全精度，否则请使用浮点数来提高性能。（此外，如果您启用压缩，浮点/双精度压缩将比数字压缩要好得多）。要进行“透视”查询，我建议在聚合上使用FILTER 子句，请参阅：https://www.postgresql.org/docs/current/sql-expressions.html#SYNTAX-AGGREGATES。那至少应该消除那些案例陈述。如果您愿意，您也可以只使用 HAVING 子句以避免另一个子选择。

查询可能类似于：

select
    time_bucket_gapfill('5 seconds', "time") as "time", -- create time buckets of 5 seconds
    "context",
    locf(last("value", "time") FILTER (WHERE "path" = 'navigation.position.longitude')) as "lon", -- last reported longitude in this bucket, if unknown in this bucket take the value of the previous bucket
    locf(last("value", "time") FILTER (WHERE "path" = 'navigation.position.latitude')) as "lat", -- last reported latitude in this bucket, if unknown in this bucket take the value of the previous bucket
    locf(last("value", "time") FILTER (WHERE "path" = 'navigation.speedOverGround')) as "sog", -- last reported speed over ground in this bucket, if unknown in this bucket take the value of the previous bucket
    locf(last("value", "time") FILTER (WHERE "path" = 'navigation.state'))) as "navstate" -- last reported navigation state in this bucket, if unknown in this bucket take the value of the previous bucket

from
    "key_value_data"
where
  $__timeFilter("time") and
    "path" in ('navigation.position.longitude', 'navigation.position.latitude', 'navigation.speedOverGround', 'navigation.state')
GROUP BY 
    1, 2
order by
    1, 2, 3
)

我一般会避免使用 varchar，文本在 PG 中更好，varchar 会增加额外的开销（尽管这主要是 varchar(n) 的情况，所以这里可能不是一个大问题，我只是更喜欢文本）。见：https://wiki.postgresql.org/wiki/Don't_Do_This

【讨论】：

感谢您的精彩回复！我会试试这些东西并报告。

以上是关于时序数据库新手，从TimescaleDB for Grafana中选择数据速度慢，查询复杂的主要内容，如果未能解决你的问题，请参考以下文章