Postgres:获取最大值和最小值,以及它们出现的时间戳

Posted

技术标签:

【中文标题】Postgres:获取最大值和最小值,以及它们出现的时间戳【英文标题】:Postgres: getting the maximum and minimum values, and timestamps they occur 【发布时间】:2017-05-15 21:12:09 【问题描述】:

我正在运行 Postgres 9.2,并且有一张温度和时间戳表,每分钟一个时间戳,以毫秒为单位:

weather=# \d weather_data
      Table "public.weather_data"
   Column    |     Type     | Modifiers 
-------------+--------------+-----------
 timestamp   | bigint       | not null
 sensor_id   | integer      | not null
 temperature | numeric(4,1) | 
 humidity    | integer      | 
 date        | date         | not null
Indexes:
    "weather_data_pkey" PRIMARY KEY, btree ("timestamp", sensor_id)
    "weather_data_date_idx" btree (date)
    "weather_data_humidity_idx" btree (humidity)
    "weather_data_sensor_id_idx" btree (sensor_id)
    "weather_data_temperature_idx" btree (temperature)
    "weather_data_time_idx" btree ("timestamp")
Foreign-key constraints:
    "weather_data_sensor_id_fkey" FOREIGN KEY (sensor_id) REFERENCES weather_sensors(sensor_id)

weather=# select * from weather_data order by timestamp desc;
   timestamp   | sensor_id | temperature | humidity |    date    
---------------+-----------+-------------+----------+------------
 1483272420000 |         2 |        22.3 |       57 | 2017-01-01
 1483272420000 |         1 |        24.9 |       53 | 2017-01-01
 1483272360000 |         2 |        22.3 |       57 | 2017-01-01
 1483272360000 |         1 |        24.9 |       58 | 2017-01-01
 1483272300000 |         2 |        22.4 |       57 | 2017-01-01
 1483272300000 |         1 |        24.9 |       57 | 2017-01-01
[...]

我有这个现有的查询,它可以获取每天的最高价和最低价,但不是特定该最高价或最低价发生的时间:

WITH t AS (
    SELECT date, highest, lowest
    FROM (
        SELECT date, max(temperature) AS highest
        FROM weather_data
        WHERE sensor_id = (SELECT sensor_id FROM weather_sensors WHERE sensor_name = 'outdoor')
        GROUP BY date
        ORDER BY date ASC
    ) h
    INNER JOIN (
        SELECT date, min(temperature) AS lowest
        FROM weather_data
        WHERE sensor_id = (SELECT sensor_id FROM weather_sensors WHERE sensor_name = 'outdoor')
        GROUP BY date
        ORDER BY date ASC
    ) l
    USING (date)
    ORDER BY date DESC
)
SELECT * from t ORDER BY date ASC;

数据库中有超过 200 万行,运行大约需要 1.2 秒,这还不错。我现在想得到具体的高点或低点的时间,我使用窗口函数想出了这个,确实工作但需要大约 5.6 秒:

SELECT h.date, high_time, high_temp, low_time, low_temp FROM (
    SELECT date, high_temp, high_time FROM (
        SELECT date, temperature AS high_temp, timestamp AS high_time, row_number()
        OVER (PARTITION BY date ORDER BY temperature DESC, timestamp DESC)
        FROM weather_data
        WHERE sensor_id = (SELECT sensor_id FROM weather_sensors WHERE sensor_name = 'outdoor')
    ) highs
    WHERE row_number = 1
) h
INNER JOIN (
    SELECT * FROM (
        SELECT date, temperature AS low_temp, timestamp AS low_time, row_number()
        OVER (PARTITION BY date ORDER BY temperature ASC, timestamp DESC)
        FROM weather_data
        WHERE sensor_id = (SELECT sensor_id FROM weather_sensors WHERE sensor_name = 'outdoor')
    ) lows
    WHERE row_number = 1
) l
ON h.date = l.date
ORDER BY h.date ASC;

我可以在第一个查询中添加一些相对简单的添加,而不会增加大量执行时间吗?我认为有,但我想我已经到了我研究这个问题太久的地步了!

【问题讨论】:

PostgreSQL - fetch the row which has the Max value for a column的可能重复 不相关,但是:第一个查询派生表中的order by没用 @a_horse_with_no_name 已注明,谢谢! 【参考方案1】:
SELECT  
        DISTINCT ON (zdate) zdate
        , first_value(ztimestamp) OVER www AS stamp_at_min
        , first_value(temperature) OVER www AS tmin
        , last_value(ztimestamp) OVER www AS stamp_at_max
        , last_value(temperature) OVER www AS tmax
FROM weather_data
WHERE sensor_id = 2
WINDOW www AS (PARTITION BY zdate ORDER BY temperature, ztimestamp
                ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
                )
        ;

前缀为z日期和z时间戳 我在排序中添加了 ztimestamp 作为决胜局

【讨论】:

效果很好,谢谢!是否有任何额外的与索引相关的技巧可以加快速度(运行大约需要 3.7 秒),还是在这种情况下没有太多可以优化的地方? 您的表基本上有两个候选键:您的 PK 和可能不是完全唯一的 zdate, sensor_id, temperature, ...。无论如何,我认为您应该摆脱单列索引。并且 zdate 可能在功能上依赖于 ztimestamp(可能是时间戳而不是 int) 获得摆脱的单列索引?有趣的。我在这张表上运行了许多其他(更简单)不相关的查询,我猜如果没有索引,这些查询最终会显着变慢,不是吗? 我不知道您的其他查询...对于这个特定查询,我会选择 sensor_id, zdate, temperature, ... 我试了一下,但并没有产生任何明显的不同。可能只需要忍受一段时间。 :) 再次感谢!【参考方案2】:

这与您的第二个查询相同,但只需要对 weather_data 表进行一次扫描:

select date, 
       max(case when high_rn = 1 then timestamp end) as high_time, 
       max(case when high_rn = 1 then temperature end) as high_temp, 
       max(case when low_rn = 1 then timestamp end) as low_time, 
       max(case when low_rn = 1 then temperature end) as low_temp
from (
  select timestamp, temperature, date, 
         row_number() OVER (PARTITION BY date ORDER BY temperature DESC, timestamp DESC) as high_rn,
         row_number() OVER (PARTITION BY date ORDER BY temperature ASC, timestamp DESC) as low_rn
  from weather_data
  where sensor_id = ...
) t
where (high_rn = 1 or low_rn = 1)
group by date;   

它使用条件聚合对仅包含最低和最高温度的结果进行交叉表(也称为“枢轴”)查询。


不相关,但是:datetimestamp 是可怕的列名称。一方面因为它们是关键字,但更重要的是因为它们没有记录列的实际含义。是“截止日期”吗? “读书日”? “处理日期”?

【讨论】:

谢谢!这个运行大约需要 5.2 秒,而上面的运行需要 3.7 秒。列名是获取特定温度读数的所有时间和日期,所以我猜是阅读日期和阅读时间。这是一个个人项目,只有我在做(只需保持我家内外的当前温度)。 :) 哈,我只记得我需要添加一个temperature != 21.8,因为温度传感器偶尔会变得奇怪并向我的应用程序发送一个值 21.8。在为窗口函数添加子查询以运行到@wildplasser 的查询,并将简单的where temperature != 21.8 添加到您的查询之后,它们现在都在大约 100 毫秒内!

以上是关于Postgres:获取最大值和最小值,以及它们出现的时间戳的主要内容,如果未能解决你的问题,请参考以下文章

PostgreSQL - 如何在单个查询中获取列的最小值和最大值以及与它们关联的行?

Postgres:获取对应于组中其他列的最大值的列的值

当 SQL 出现最小值或最大值时,如何获取时间戳?

用于从 Hive 中获取单个表的最大值、最小值和其他列的相应值以及总记录数的数据库查询

最小值和最大值的八度代码以及查找索引

如何在 MySQL 中获取记录数以及最小值/最大值?