聚合 + 最后 & 第一个 -> 丢失订单
Posted
技术标签:
【中文标题】聚合 + 最后 & 第一个 -> 丢失订单【英文标题】:Aggregation + last & first -> losing order 【发布时间】:2018-06-11 15:02:24 【问题描述】:我正在尝试在 15 分钟的步长间隔内选择数据。主要的分组接缝按预期工作,但我在每 15 分钟组内失去秩序。原因是例如: 对于 time_stamp 在 0-14 分钟范围内的 4 个点 -> “floor(EXTRACT(minute FROM time_stamp) / 15) AS Quarter”,将返回值“0”(如预期的那样)。 所以然后 ORDER BY "quarter" 4 带有 "quarter" == "0" 的行,从中选择最后一个值和第一个值。 这导致我无法保证基于时间戳的排序。
SELECT
first(value) as first_value,
last(value) as last_value,
CAST(EXTRACT(year FROM time_stamp) AS INTEGER) AS year,
CAST(EXTRACT(month FROM time_stamp) AS INTEGER) AS month,
CAST(EXTRACT(day FROM time_stamp) AS INTEGER) AS day,
CAST(EXTRACT(hour FROM time_stamp) AS INTEGER) AS hour,
floor(EXTRACT(minute FROM time_stamp) / 15) AS quarter,
FROM
my_table
GROUP BY
year,
month,
day,
hour,
quarter,
ORDER BY
year,
month,
day,
hour,
quarter
下面是表格示例:
CREATE TABLE my_table (
id integer NOT NULL,
time_stamp timestamp without time zone NOT NULL,
value double precision NOT NULL,
);
CREATE SEQUENCE my_table_id_seq
START WITH 1
INCREMENT BY 1
NO MINVALUE
NO MAXVALUE
CACHE 1;
ALTER TABLE ONLY my_table ALTER COLUMN id SET DEFAULT nextval('my_table_id_seq'::regclass);
ALTER TABLE ONLY my_table
ADD CONSTRAINT my_table_pkey PRIMARY KEY (id);
CREATE INDEX ix_my_table_time_stamp ON my_table USING btree (time_stamp);
我还从查询中删除了“first”和“last”函数,以通知排序确实丢失了。
任何建议如何保持每 15 分钟的排序?
【问题讨论】:
First and last value of window function in one row in PostgreSQL的可能重复 【参考方案1】:没有标准的聚合函数first()
和last()
,您可能是指用户定义的聚合,例如:
create or replace function first_agg(anyelement, anyelement)
returns anyelement language sql immutable strict
as $$ select $1; $$;
create or replace function last_agg(anyelement, anyelement)
returns anyelement language sql immutable strict
as $$ select $2; $$;
create aggregate first(anyelement) (
sfunc = first_agg,
stype = anyelement
);
create aggregate last(anyelement) (
sfunc = last_agg,
stype = anyelement
);
在聚合中使用order by
,请参阅文档中的4.2.7. Aggregate Expressions。
SELECT
first(value order by time_stamp) as first_value,
last(value order by time_stamp) as last_value,
CAST(EXTRACT(year FROM time_stamp) AS INTEGER) AS year,
CAST(EXTRACT(month FROM time_stamp) AS INTEGER) AS month,
CAST(EXTRACT(day FROM time_stamp) AS INTEGER) AS day,
CAST(EXTRACT(hour FROM time_stamp) AS INTEGER) AS hour,
floor(EXTRACT(minute FROM time_stamp) / 15) AS quarter
FROM
my_table
GROUP BY
year,
month,
day,
hour,
quarter
ORDER BY
year,
month,
day,
hour,
quarter
DbFiddle.
【讨论】:
以上是关于聚合 + 最后 & 第一个 -> 丢失订单的主要内容,如果未能解决你的问题,请参考以下文章
Apache Spark Group By(获取组中的第一个和最后一个值)
“盛大游戏杯”第15届上海大学程序设计联赛夏季赛暨上海高校金马五校赛 I.丢史蒂芬妮
Update:sparksql:第3节 Dataset (DataFrame) 的基础操作 & 第4节 SparkSQL_聚合操作_连接操作