PostgreSQL:使用列别名的窗口函数

Posted

技术标签:

【中文标题】PostgreSQL:使用列别名的窗口函数【英文标题】:PostgreSQL: Window functions using column alias 【发布时间】:2015-09-21 09:57:44 【问题描述】:

我有一个下表:

                                    Table "public.activity"

   Column   |            Type             |           Modifiers                       
------------+-----------------------------+-------------------------------------------------------
 id         | integer                     | not null default nextval('activity_id_seq'::regclass)
 scheduleid | integer                     | 
 name       | text                        | 
 duedate    | timestamp without time zone | 
Indexes:
    "activity_pkey" PRIMARY KEY, btree (id)

有以下数据:

 id | scheduleid |   name   |          duedate           
----+------------+----------+----------------------------
  1 |          1 | ACT1     | 2015-09-21 13:34:53.738449
  2 |          1 | ACT1     | 2015-09-20 13:35:02.770369
  3 |          1 | ACT1     | 2015-09-19 13:35:07.650204
  4 |          1 | ACT1     | 2015-09-18 13:35:11.930225
  5 |          1 | ACT1.0.0 | 2015-09-17 13:35:48.033791
  6 |          1 | ACT1.0.0 | 2015-09-16 13:35:51.55382
  7 |          2 | ACT2.0.0 | 2015-09-21 13:36:56.42534
  8 |          2 | ACT2.0.0 | 2015-09-28 13:37:21.065071
  9 |          2 | ACT2.0.0 | 2015-10-05 13:37:26.753227
 10 |          2 | ACT2.0.0 | 2015-10-12 13:37:30.656846
 11 |          2 | ACT2.0.0 | 2015-10-19 13:37:34.54473
 12 |          2 | ACT2.0.0 | 2015-10-26 13:37:38.192843
(12 rows)

对于每个 scheduleId,我们都创建了活动。

我需要显示每个时间表的最新独特活动以及它下的活动计数。

使用 Postgres 窗口函数执行以下查询。

WITH TOP_ACTIVITIES AS (
    SELECT DISTINCT ON (scheduleid, name)
    id, scheduleid, name, duedate,
    count(*) over(partition by scheduleid, name) as clubbedcount
    from activity ORDER BY scheduleid, name, duedate desc
)
select * from TOP_ACTIVITIES;

结果如下:

id | scheduleid |   name   |          duedate           | clubbedcount 
----+------------+----------+----------------------------+--------------
  1 |          1 | ACT1     | 2015-09-21 13:34:53.738449 |            4
  5 |          1 | ACT1.0.0 | 2015-09-17 13:35:48.033791 |            2
 12 |          2 | ACT2.0.0 | 2015-10-26 13:37:38.192843 |            6

到目前为止一切顺利:P

现在有个小转折是,我们也需要通过 rangeTag 对活动进行分组

Eg: Todays date being 21-Sep-2015,
activities with duedate <= now() --> club under TODAY tag
activities with duedate <= now() + 7 days --> club under THIS WEEK tag
activities with duedate <= now() + 1 month --> club under THIS MONTH tag
ELSE --> club under FUTURE tag 

因此我们需要 1. 由 rangeTag、scheduleid 和 name 定义的每个分区的***活动 2. 活动计数,针对每个分区整理到顶部活动中。

将我的查询稍微修改为:

WITH TOP_ACTIVITIES AS (
     SELECT DISTINCT ON (range, scheduleid, name)
     id, scheduleid, name, duedate,

     CASE WHEN duedate < now() THEN 'TODAY'
          WHEN duedate < now() + interval '7 days' THEN 'THIS WEEK'
          WHEN duedate < now() + interval '1 month' THEN 'THIS MONTH'
          ELSE 'FUTURE' 
     END AS range,

     count(*) over(partition by scheduleid, name)


     from activity ORDER BY range, scheduleid, name,duedate desc
)
select * from TOP_ACTIVITIES ORDER BY scheduleid;

给了我 NEAR 想要的结果,除了 count :P

 id | scheduleid |   name   |          duedate           |   range    | count 
----+------------+----------+----------------------------+------------+-------
  1 |          1 | ACT1     | 2015-09-21 13:34:53.738449 | TODAY      |     4
  5 |          1 | ACT1.0.0 | 2015-09-17 13:35:48.033791 | TODAY      |     2
 12 |          2 | ACT2.0.0 | 2015-10-26 13:37:38.192843 | FUTURE     |     6
 11 |          2 | ACT2.0.0 | 2015-10-19 13:37:34.54473  | THIS MONTH |     6
  8 |          2 | ACT2.0.0 | 2015-09-28 13:37:21.065071 | THIS WEEK  |     6
  7 |          2 | ACT2.0.0 | 2015-09-21 13:36:56.42534  | TODAY      |     6

我也需要按“范围”划分的计数。

但是,替换

count(*) over(partition by scheduleid, name)

count(*) over(partition by range, scheduleid, name) 

没用。

错误是

错误:“范围”列不存在 LINE 9: count(*) over(partition by range,scheduleid, name)

【问题讨论】:

就像其他任何地方一样,您不能从另一个引用 select-list 术语,或从 where 子句引用 select-list 条目等。您需要一个子查询或另一个CTE 术语。 【参考方案1】:

count()(和DISTINCT ON)移至新查询:

WITH top_activities AS (
    SELECT 
        id, scheduleid, name, duedate,
        CASE WHEN duedate < now() THEN 'TODAY'
            WHEN duedate < now() + interval '7 days' THEN 'THIS WEEK'
            WHEN duedate < now() + interval '1 month' THEN 'THIS MONTH'
            ELSE 'FUTURE'  
        END AS range
    FROM activity ORDER BY range, scheduleid, name,duedate desc
    ),
top_activities_with_count as (  
    SELECT DISTINCT ON (range, scheduleid, name)
        *, count(*) over(partition by range, scheduleid, name)
    FROM top_activities
    )
SELECT * FROM top_activities_with_count ORDER BY scheduleid;

【讨论】:

以上是关于PostgreSQL:使用列别名的窗口函数的主要内容,如果未能解决你的问题,请参考以下文章

如何忽略 PostgreSQL 窗口函数中的空值?或返回列中的下一个非空值

PostgreSQL 中的窗口函数尾随日期

带有“lag()”窗口函数的 PostgreSQL 更新查询

PostgreSQL 窗口函数:row_number() over (partition col order by col2)

jOOQ - 在查询中简洁地表示列和聚合/窗口函数

PostgreSQL:使用窗口函数返回单行