在 PostgreSQL 中使用 row_number() 旋转

Posted

技术标签:

【中文标题】在 PostgreSQL 中使用 row_number() 旋转【英文标题】:Pivot with row_number() in PostgreSQL 【发布时间】:2020-09-20 21:22:44 【问题描述】:

我在 PostgreSQL 中有一个表模式,用于监控某些设备。根据某些情况,设备进入监控阶段,如果它连续第二天保持该状态,则设备进入操作阶段。在病情好转之前,它将一直处于行动阶段。

CREATE TABLE public.monitoring_engine
(
    cell_id character varying(1024) COLLATE pg_catalog."default",
    monitoring_started_at timestamp without time zone,
    gap character varying(1024) COLLATE pg_catalog."default",
    status character varying(1024) COLLATE pg_catalog."default"
)
WITH (
    OIDS = FALSE
)
TABLESPACE pg_default;

ALTER TABLE public.monitoring_engine
    OWNER to postgres;

插入语句

INSERT INTO public.monitoring_engine(cell_id, monitoring_started_at, gap, status) VALUES ('Cell_A', '2020-09-15', NULL, 'Monitor');
INSERT INTO public.monitoring_engine(cell_id, monitoring_started_at, gap, status) VALUES ('Cell_A', '2020-09-16', '1 day', 'Action');
INSERT INTO public.monitoring_engine(cell_id, monitoring_started_at, gap, status) VALUES ('Cell_A', '2020-09-17', '1 day', 'Action');
INSERT INTO public.monitoring_engine(cell_id, monitoring_started_at, gap, status) VALUES ('Cell_A', '2020-09-18', '1 day', 'Action');
INSERT INTO public.monitoring_engine(cell_id, monitoring_started_at, gap, status) VALUES ('Cell_A', '2020-09-20', '2 days', 'Monitor');
INSERT INTO public.monitoring_engine(cell_id, monitoring_started_at, gap, status) VALUES ('Cell_B', '2020-09-15', NULL, 'Monitor');
INSERT INTO public.monitoring_engine(cell_id, monitoring_started_at, gap, status) VALUES ('Cell_B', '2020-09-17', '2 days', 'Monitor');
INSERT INTO public.monitoring_engine(cell_id, monitoring_started_at, gap, status) VALUES ('Cell_B', '2020-09-18', '1 day', 'Action');
INSERT INTO public.monitoring_engine(cell_id, monitoring_started_at, gap, status) VALUES ('Cell_B', '2020-09-20', '2 days', 'Monitor');
INSERT INTO public.monitoring_engine(cell_id, monitoring_started_at, gap, status) VALUES ('Cell_B', '2020-09-21', '1 day', 'Action');
INSERT INTO public.monitoring_engine(cell_id, monitoring_started_at, gap, status) VALUES ('Cell_B', '2020-09-23', '2 days', 'Monitor');
INSERT INTO public.monitoring_engine(cell_id, monitoring_started_at, gap, status) VALUES ('Cell_B', '2020-09-24', '1 day', 'Action');
INSERT INTO public.monitoring_engine(cell_id, monitoring_started_at, gap, status) VALUES ('Cell_B', '2020-09-25', '1 day', 'Action');
INSERT INTO public.monitoring_engine(cell_id, monitoring_started_at, gap, status) VALUES ('Cell_B', '2020-09-26', '1 day', 'Action');

执行上述查询后,表中会出现以下数据。

cell_id         monitoring_started_at       gap     status
Cell_A          9/15/2020 0:00          NULL        Monitor
Cell_A          9/16/2020 0:00          1 day       Action
Cell_A          9/17/2020 0:00          1 day       Action
Cell_A          9/18/2020 0:00          1 day       Action
Cell_A          9/20/2020 0:00          2 days      Monitor
Cell_B          9/15/2020 0:00          NULL        Monitor
Cell_B          9/17/2020 0:00          2 days      Monitor
Cell_B          9/18/2020 0:00          1 day       Action
Cell_B          9/20/2020 0:00          2 days      Monitor
Cell_B          9/21/2020 0:00          1 day       Action
Cell_B          9/23/2020 0:00          2 days      Monitor
Cell_B          9/24/2020 0:00          1 day       Action
Cell_B          9/25/2020 0:00          1 day       Action
Cell_B          9/26/2020 0:00          1 day       Action

需要的输出

cell_id     monitor_date        first_action_date   last_action_date
Cell_A      9/15/2020 0:00      9/16/2020 0:00      9/18/2020 0:00
Cell_A      9/20/2020 0:00      null                null
Cell_B      9/15/2020 0:00      null                null
Cell_B      9/17/2020 0:00      9/18/2020 0:00      9/18/2020 0:00
Cell_B      9/20/2020 0:00      9/21/2020 0:00      9/21/2020 0:00
Cell_B      9/23/2020 0:00      9/24/2020 0:00      9/26/2020 0:00

所需的输出将取决于连续的日期。如果日期连续性出现任何中断,则设备进入监控阶段。

需要在 PostgreSQL 中完成。

【问题讨论】:

SQL Server != PostgreSQL 【参考方案1】:

这是一个孤岛问题。我认为最简单的方法是计算最新的“监控”状态的日期,然后使用它对行进行分组。

select 
    cell_id, 
    monitor_date, 
    min(monitoring_started_at) filter(where status = 'Action') first_action_date,
    max(monitoring_started_at) filter(where status = 'Action') last_action_date
from (
    select me.*,
        max(monitoring_started_at) 
            filter(where status = 'Monitor') 
            over(partition by cell_id order by monitoring_started_at) 
            as monitor_date
    from monitoring_engine me
) t
group by cell_id, monitor_date
order by cell_id, monitor_date

Demo on DB Fiddle

cell_id |监控日期 | first_action_date | last_action_date :-------- | :----------------- | :----------------- | :----------------- 单元格_A | 2020-09-15 00:00:00 | 2020-09-16 00:00:00 | 2020-09-18 00:00:00 单元格_A | 2020-09-20 00:00:00 | | 单元格_B | 2020-09-15 00:00:00 | | 单元格_B | 2020-09-17 00:00:00 | 2020-09-18 00:00:00 | 2020-09-18 00:00:00 单元格_B | 2020-09-20 00:00:00 | 2020-09-21 00:00:00 | 2020-09-21 00:00:00 单元格_B | 2020-09-23 00:00:00 | 2020-09-24 00:00:00 | 2020-09-26 00:00:00

【讨论】:

以上是关于在 PostgreSQL 中使用 row_number() 旋转的主要内容,如果未能解决你的问题,请参考以下文章

PostgreSQL 窗口函数:row_number() over (partition col order by col2)

为啥 row_number() 比使用偏移快?

Postgresql 根据单列或几列分组去重row_number() over() partition by

如何在Postgresql的一行中获取第一个和最后一个值[关闭]

如何在视图中添加 ROW_NUMBER()?

Postgresql 直接在查询结果中生成唯一ID