PostgreSQL:使用列别名的窗口函数
Posted
技术标签:
【中文标题】PostgreSQL:使用列别名的窗口函数【英文标题】:PostgreSQL: Window functions using column alias 【发布时间】:2015-09-21 09:57:44 【问题描述】:我有一个下表:
Table "public.activity"
Column | Type | Modifiers
------------+-----------------------------+-------------------------------------------------------
id | integer | not null default nextval('activity_id_seq'::regclass)
scheduleid | integer |
name | text |
duedate | timestamp without time zone |
Indexes:
"activity_pkey" PRIMARY KEY, btree (id)
有以下数据:
id | scheduleid | name | duedate
----+------------+----------+----------------------------
1 | 1 | ACT1 | 2015-09-21 13:34:53.738449
2 | 1 | ACT1 | 2015-09-20 13:35:02.770369
3 | 1 | ACT1 | 2015-09-19 13:35:07.650204
4 | 1 | ACT1 | 2015-09-18 13:35:11.930225
5 | 1 | ACT1.0.0 | 2015-09-17 13:35:48.033791
6 | 1 | ACT1.0.0 | 2015-09-16 13:35:51.55382
7 | 2 | ACT2.0.0 | 2015-09-21 13:36:56.42534
8 | 2 | ACT2.0.0 | 2015-09-28 13:37:21.065071
9 | 2 | ACT2.0.0 | 2015-10-05 13:37:26.753227
10 | 2 | ACT2.0.0 | 2015-10-12 13:37:30.656846
11 | 2 | ACT2.0.0 | 2015-10-19 13:37:34.54473
12 | 2 | ACT2.0.0 | 2015-10-26 13:37:38.192843
(12 rows)
对于每个 scheduleId,我们都创建了活动。
我需要显示每个时间表的最新独特活动以及它下的活动计数。
使用 Postgres 窗口函数执行以下查询。
WITH TOP_ACTIVITIES AS (
SELECT DISTINCT ON (scheduleid, name)
id, scheduleid, name, duedate,
count(*) over(partition by scheduleid, name) as clubbedcount
from activity ORDER BY scheduleid, name, duedate desc
)
select * from TOP_ACTIVITIES;
结果如下:
id | scheduleid | name | duedate | clubbedcount
----+------------+----------+----------------------------+--------------
1 | 1 | ACT1 | 2015-09-21 13:34:53.738449 | 4
5 | 1 | ACT1.0.0 | 2015-09-17 13:35:48.033791 | 2
12 | 2 | ACT2.0.0 | 2015-10-26 13:37:38.192843 | 6
到目前为止一切顺利:P
现在有个小转折是,我们也需要通过 rangeTag 对活动进行分组
Eg: Todays date being 21-Sep-2015,
activities with duedate <= now() --> club under TODAY tag
activities with duedate <= now() + 7 days --> club under THIS WEEK tag
activities with duedate <= now() + 1 month --> club under THIS MONTH tag
ELSE --> club under FUTURE tag
因此我们需要 1. 由 rangeTag、scheduleid 和 name 定义的每个分区的***活动 2. 活动计数,针对每个分区整理到顶部活动中。
将我的查询稍微修改为:
WITH TOP_ACTIVITIES AS (
SELECT DISTINCT ON (range, scheduleid, name)
id, scheduleid, name, duedate,
CASE WHEN duedate < now() THEN 'TODAY'
WHEN duedate < now() + interval '7 days' THEN 'THIS WEEK'
WHEN duedate < now() + interval '1 month' THEN 'THIS MONTH'
ELSE 'FUTURE'
END AS range,
count(*) over(partition by scheduleid, name)
from activity ORDER BY range, scheduleid, name,duedate desc
)
select * from TOP_ACTIVITIES ORDER BY scheduleid;
给了我 NEAR 想要的结果,除了 count :P
id | scheduleid | name | duedate | range | count
----+------------+----------+----------------------------+------------+-------
1 | 1 | ACT1 | 2015-09-21 13:34:53.738449 | TODAY | 4
5 | 1 | ACT1.0.0 | 2015-09-17 13:35:48.033791 | TODAY | 2
12 | 2 | ACT2.0.0 | 2015-10-26 13:37:38.192843 | FUTURE | 6
11 | 2 | ACT2.0.0 | 2015-10-19 13:37:34.54473 | THIS MONTH | 6
8 | 2 | ACT2.0.0 | 2015-09-28 13:37:21.065071 | THIS WEEK | 6
7 | 2 | ACT2.0.0 | 2015-09-21 13:36:56.42534 | TODAY | 6
我也需要按“范围”划分的计数。
但是,替换
count(*) over(partition by scheduleid, name)
与
count(*) over(partition by range, scheduleid, name)
没用。
错误是
错误:“范围”列不存在 LINE 9: count(*) over(partition by range,scheduleid, name)
【问题讨论】:
就像其他任何地方一样,您不能从另一个引用select
-list 术语,或从 where
子句引用 select
-list 条目等。您需要一个子查询或另一个CTE 术语。
【参考方案1】:
将count()
(和DISTINCT ON
)移至新查询:
WITH top_activities AS (
SELECT
id, scheduleid, name, duedate,
CASE WHEN duedate < now() THEN 'TODAY'
WHEN duedate < now() + interval '7 days' THEN 'THIS WEEK'
WHEN duedate < now() + interval '1 month' THEN 'THIS MONTH'
ELSE 'FUTURE'
END AS range
FROM activity ORDER BY range, scheduleid, name,duedate desc
),
top_activities_with_count as (
SELECT DISTINCT ON (range, scheduleid, name)
*, count(*) over(partition by range, scheduleid, name)
FROM top_activities
)
SELECT * FROM top_activities_with_count ORDER BY scheduleid;
【讨论】:
以上是关于PostgreSQL:使用列别名的窗口函数的主要内容,如果未能解决你的问题,请参考以下文章
如何忽略 PostgreSQL 窗口函数中的空值?或返回列中的下一个非空值
带有“lag()”窗口函数的 PostgreSQL 更新查询
PostgreSQL 窗口函数:row_number() over (partition col order by col2)