如何通过 Postgres 11 函数(存储过程)在某个时区返回 $TIMESTAMP 或之前的最新行?
Posted
技术标签:
【中文标题】如何通过 Postgres 11 函数(存储过程)在某个时区返回 $TIMESTAMP 或之前的最新行?【英文标题】:How can I return the most recent rows at or before $TIMESTAMP at a certain time zone, via a Postgres 11 function (stored proc)? 【发布时间】:2020-09-13 22:47:27 【问题描述】:我有一个像这样的 Postgres 11 表:
CREATE TABLE schema.foo_numbers (
id INTEGER PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
created_at TIMESTAMP WITH TIME ZONE DEFAULT now(),
quantity INTEGER,
category TEXT
);
它有一些数据,例如:
id | created_at | quantity | category
----+------------------------+----------+----------
1 | 2020-01-01 12:00:00+00 | 2 | a
2 | 2020-01-02 17:00:00+00 | 1 | b
3 | 2020-01-01 15:00:00+00 | 6 | a
4 | 2020-01-04 09:00:00+00 | 1 | b
5 | 2020-01-05 19:00:00+00 | 2 | a
6 | 2020-01-06 23:00:00+00 | 8 | b
7 | 2020-01-07 20:00:00+00 | 1 | a
8 | 2020-01-08 04:00:00+00 | 2 | b
9 | 2020-01-09 23:00:00+00 | 1 | a
10 | 2020-01-10 19:00:00+00 | 1 | b
11 | 2020-01-11 05:00:00+00 | 1 | a
12 | 2020-01-12 21:00:00+00 | 1 | b
13 | 2020-01-13 01:00:00+00 | 1 | a
14 | 2020-01-14 18:00:00+00 | 1 | b
我有另一个表跟踪 foo 类别的某些属性:
create table schema.foo_category_properties (
id INTEGER PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
category TEXT NOT NULL,
some_bool BOOLEAN NOT NULL DEFAULT FALSE
);
这个表有这样的数据:
id | category | some_bool
----+----------+-----------
1 | a | f
2 | b | f
我需要创建一个 postgres 函数(通过 postgREST api 从应用程序逻辑调用),对于参数 $TIMESTAMP ,它将返回每个类别的最新记录 created_at
理想情况下,传入的参数将被视为 TIMESTAMP WITH TIME ZONE AT TIME ZONE 'America/Los_Angeles' ,并且该函数返回最新记录,其时间戳显示在同一时区 - 但是,如果这是不可能的,并且所有时间戳都保留在 UTC [在应用程序逻辑中偏移],前提是以一致的方式返回正确的数据。
服务器时间设置为 UTC:
psql => show time zone;
TimeZone
----------
UTC
(1 row)
我写的postgres函数是这样的:
CREATE OR REPLACE FUNCTION schema.foo_proc (end_date TEXT)
RETURNS TABLE (
id INTEGER,
category TEXT,
quantity BIGINT,
snapshot_count NUMERIC,
latest_entry TIMESTAMP WITH TIME ZONE
)
AS $$
#variable_conflict use_column
BEGIN
RETURN QUERY
SELECT
alias1.id,
alias1.category,
alias1.quantity,
alias1.snapshot_count,
alias2.latest_entry AS latest_entry
FROM
(
SELECT
id,
category,
quantity,
sum(quantity) OVER (partition by category ORDER BY created_at) AS snapshot_count
FROM
schema.foo_numbers
) AS alias1
INNER JOIN
(
SELECT
max(id) AS id,
category,
max(created_at AT TIME ZONE 'America/Los_Angeles') AS latest_entry
from
schema.foo_numbers
WHERE created_at AT TIME ZONE 'America/Los_Angeles' <= to_timestamp($1', 'YYYY-MM-DD HH24:MI:SS') :: TIMESTAMPTZ AT TIME ZONE 'America/Los_Angeles'
group by category
order by category
) AS alias2
ON
alias1.id = alias2.id
INNER JOIN
schema.foo_category_properties fcp
ON
alias2.category = fcp.category
WHERE fcp.some_bool IS FALSE
ORDER BY
alias1.category
;
END;
$$ LANGUAGE plpgsql;
这是foo_numbers
中的数据,时间戳转移到时区“America/Los_Angeles”
psql=> select id, created_at at time zone 'america/los_angeles', quantity, category from schemai.foo_numbers order by created_at;
id | timezone | quantity | category
----+---------------------+----------+----------
1 | 2020-01-01 04:00:00 | 2 | a
3 | 2020-01-01 07:00:00 | 6 | a
2 | 2020-01-02 09:00:00 | 1 | b
4 | 2020-01-04 01:00:00 | 1 | b
5 | 2020-01-05 11:00:00 | 2 | a
6 | 2020-01-06 15:00:00 | 8 | b
7 | 2020-01-07 12:00:00 | 1 | a
8 | 2020-01-07 20:00:00 | 2 | b
9 | 2020-01-09 15:00:00 | 1 | a
10 | 2020-01-10 11:00:00 | 1 | b
11 | 2020-01-10 21:00:00 | 1 | a
12 | 2020-01-12 13:00:00 | 1 | b
13 | 2020-01-12 17:00:00 | 1 | a
14 | 2020-01-14 10:00:00 | 1 | b
(14 rows)
参数的预期输出:
"end_date":"2020-01-07 19:00:00"
应该是
id | category | quantity | snapshot_count | latest_entry
----+----------+----------+----------------+------------------------
6 | b | 8 | 10 | 2020-01-06 15:00:00
7 | a | 1 | 11 | 2020-01-07 12:00:00
(2 rows)
但是,相同参数的实际输出是:
id | category | quantity | snapshot_count | latest_entry
----+----------+----------+----------------+------------------------
5 | a | 2 | 10 | 2020-01-05 19:00:00+00
6 | b | 8 | 10 | 2020-01-06 23:00:00+00
(2 rows)
在 UTC 时将参数转换为 timestamptz 会发生类似的意外结果。
在我尝试过的所有变体中,返回的行都没有正确匹配参数边界。
显然,我无法理解 PG 中如何处理时区——我已经详细阅读了官方文档和一些关于 SO 的相关问题,以及在 to_timestamp() 的 PG 论坛上功能进行了讨论,但经过大量的试验和错误后未能获得正确的结果。
非常感谢所有指导!
【问题讨论】:
【参考方案1】:您可以使用distinct on
和正确的时区翻译:
select distinct on (n.category)
n.id,
n.created_at at time zone 'America/Los_Angeles' at time zone 'utc' created_at,
n.quantity,
n.category,
sum(quantity)
over (partition by n.category order by n.created_at) as snapshot_count
from foo_numbers n
inner join foo_category_properties cp on cp.category = n.category
where n.created_at <= '2020-01-07 19:00:00'::timestamp with time zone
at time zone 'utc' at time zone 'America/Los_Angeles'
order by n.category, n.created_at desc
Demo on DB Fiddle:
编号 | created_at |数量 |类别 |快照计数 -: | :--------------------- | --------: | :------- | -------------: 7 | 2020-01-07 12:00:00+00 | 1 |一个 | 11 6 | 2020-01-06 15:00:00+00 | 8 |乙 | 10【讨论】:
这非常好用,而且比我的解决方案优雅得多!我不太明白的一件事是:为什么每次处理时间戳的时区都需要声明 [converted] 两次,以及为什么调用各个时区的顺序是 reversed从第一次治疗到第二次治疗?以上是关于如何通过 Postgres 11 函数(存储过程)在某个时区返回 $TIMESTAMP 或之前的最新行?的主要内容,如果未能解决你的问题,请参考以下文章
Node.js:如何在 Sequelize 中使用 Postgres 存储过程?