聚合 + 最后 & 第一个 -> 丢失订单

Posted

技术标签:

【中文标题】聚合 + 最后 & 第一个 -> 丢失订单【英文标题】:Aggregation + last & first -> losing order 【发布时间】:2018-06-11 15:02:24 【问题描述】:

我正在尝试在 15 分钟的步长间隔内选择数据。主要的分组接缝按预期工作,但我在每 15 分钟组内失去秩序。原因是例如: 对于 time_stamp 在 0-14 分钟范围内的 4 个点 -> “floor(EXTRACT(minute FROM time_stamp) / 15) AS Quarter”,将返回值“0”(如预期的那样)。 所以然后 ORDER BY "quarter" 4 带有 "quarter" == "0" 的行,从中选择最后一个值和第一个值。 这导致我无法保证基于时间戳的排序。

SELECT
    first(value) as first_value,
    last(value) as last_value,
    CAST(EXTRACT(year FROM time_stamp) AS INTEGER) AS year,
    CAST(EXTRACT(month FROM time_stamp) AS INTEGER) AS month,
    CAST(EXTRACT(day FROM time_stamp) AS INTEGER) AS day,
    CAST(EXTRACT(hour FROM time_stamp) AS INTEGER) AS hour,
    floor(EXTRACT(minute FROM time_stamp) / 15) AS quarter,
FROM
    my_table
GROUP BY
    year,
    month,
    day,
    hour,
    quarter,
ORDER BY
    year,
    month,
    day,
    hour,
    quarter

下面是表格示例:

CREATE TABLE my_table (
    id integer NOT NULL,
    time_stamp timestamp without time zone NOT NULL,
    value double precision NOT NULL,
);


CREATE SEQUENCE my_table_id_seq
    START WITH 1
    INCREMENT BY 1
    NO MINVALUE
    NO MAXVALUE
    CACHE 1;


ALTER TABLE ONLY my_table ALTER COLUMN id SET DEFAULT nextval('my_table_id_seq'::regclass);


ALTER TABLE ONLY my_table
    ADD CONSTRAINT my_table_pkey PRIMARY KEY (id);


CREATE INDEX ix_my_table_time_stamp ON my_table USING btree (time_stamp);

我还从查询中删除了“first”和“last”函数,以通知排序确实丢失了。

任何建议如何保持每 15 分钟的排序?

【问题讨论】:

First and last value of window function in one row in PostgreSQL的可能重复 【参考方案1】:

没有标准的聚合函数first()last(),您可能是指用户定义的聚合,例如:

create or replace function first_agg(anyelement, anyelement)
returns anyelement language sql immutable strict
as $$ select $1; $$;

create or replace function last_agg(anyelement, anyelement)
returns anyelement language sql immutable strict
as $$ select $2; $$;

create aggregate first(anyelement) (
    sfunc = first_agg,
    stype = anyelement
);

create aggregate last(anyelement) (
    sfunc = last_agg,
    stype = anyelement
);

在聚合中使用order by,请参阅文档中的4.2.7. Aggregate Expressions。

SELECT
    first(value order by time_stamp) as first_value,
    last(value order by time_stamp) as last_value,
    CAST(EXTRACT(year FROM time_stamp) AS INTEGER) AS year,
    CAST(EXTRACT(month FROM time_stamp) AS INTEGER) AS month,
    CAST(EXTRACT(day FROM time_stamp) AS INTEGER) AS day,
    CAST(EXTRACT(hour FROM time_stamp) AS INTEGER) AS hour,
    floor(EXTRACT(minute FROM time_stamp) / 15) AS quarter
FROM
    my_table
GROUP BY
    year,
    month,
    day,
    hour,
    quarter
ORDER BY
    year,
    month,
    day,
    hour,
    quarter

DbFiddle.

【讨论】:

以上是关于聚合 + 最后 & 第一个 -> 丢失订单的主要内容,如果未能解决你的问题,请参考以下文章

Apache Spark Group By(获取组中的第一个和最后一个值)

路由交换学习第九天:边缘端口&BPDU保护&链路聚合

“盛大游戏杯”第15届上海大学程序设计联赛夏季赛暨上海高校金马五校赛 I.丢史蒂芬妮

Update:sparksql:第3节 Dataset (DataFrame) 的基础操作 & 第4节 SparkSQL_聚合操作_连接操作

性能测试学习之路 jmeter常见性能指标(聚合报告 && 服务器性能监控配置 && 图形结果 && 概要报告)

Linux入门进阶第四天(下)——程序管理(补充内容)