PostgreSQL 选择 r.* by MIN() 并在两列上进行分组

Posted

技术标签:

【中文标题】PostgreSQL 选择 r.* by MIN() 并在两列上进行分组【英文标题】:PostgreSQL Select the r.* by MIN() with group-by on two columns 【发布时间】:2021-04-28 08:30:23 【问题描述】:

名为results 的表的示例架构

id user_id activity_id activity_type_id start_date_local elapsed_time
1 100 11111 1 2014-01-07 04:34:38 4444
2 100 22222 1 2015-04-14 06:44:42 5555
3 100 33333 1 2015-04-14 06:44:42 7777
4 100 44444 2 2014-01-07 04:34:38 12345
5 200 55555 1 2015-12-22 16:32:56 5023

问题

通过activity_type_idyear选择每个用户最快活动的结果(即最小经过时间)。

(基本上,在这个简化的示例中,记录 ID=3 应该从选择中排除,因为记录 ID=2 对于给定 activity_type_id 1 和 2015 年的用户 100 是最快的)

我尝试过的

SELECT user_id,
       activity_type_id,
       EXTRACT(year FROM start_date_local) AS year,
       MIN(elapsed_time) AS fastest_time
FROM results
GROUP BY activity_type_id, user_id, year
ORDER BY activity_type_id, user_id, year;

实际

选择我想要的正确结果集,但只包含按列分组

user_id activity_type_id year fastest_time
100 1 2014 4444
100 1 2015 5555
100 2 2014 12345
200 1 2015 5023

目标

拥有所有列的实际完整记录。即results.* + year

id user_id activity_id activity_type_id start_date_local year elapsed_time
1 100 11111 1 2014-01-07 04:34:38 2014 2014
2 100 22222 1 2015-04-14 06:44:42 2015 5555
4 100 44444 2 2014-01-07 04:34:38 2014 12345
5 200 55555 1 2015-12-22 16:32:56 2015 5023

【问题讨论】:

【参考方案1】:

您可以为此使用窗口函数:

select id, user_id, activity_id, activity_type_id, start_date_local, year, elapsed_time
from (
  SELECT id, 
         user_id,
         activity_id,
         activity_type_id,
         start_date_local,
         EXTRACT(year FROM start_date_local) AS year,
         elapsed_time,
         min(elapsed_time) over (partition by user_id, activity_type_id, EXTRACT(year FROM start_date_local)) as fastest_time
  FROM results
) t
where elapsed_time = fastest_time
order by activity_type_id, user_id, year;

或者使用distinct on ()

select distinct on (activity_type_id, user_id, extract(year from start_date_local)) 
       id, 
       user_id,
       activity_id,
       activity_type_id,
       extract(year from start_date_local) as year,
       elapsed_time
from results
order by activity_type_id, user_id, year, elapsed_time;

Online example

【讨论】:

【参考方案2】:

我想你想要这个:

SELECT DISTINCT ON (user_id, activity_type_id, EXTRACT(year FROM start_date_local)) 
     *, EXTRACT(year FROM start_date_local) AS year
FROM results
ORDER BY user_id, activity_type_id, year, elapsed_time;

【讨论】:

以上是关于PostgreSQL 选择 r.* by MIN() 并在两列上进行分组的主要内容,如果未能解决你的问题,请参考以下文章

Postgresql ORDER BY - 选择正确的索引

Oracle 转 postgresql 递归 connect_by_isleaf 方案

在 SQL 中选择具有 MIN(计算排名)和 GROUPed BY 不同列的行时性能不佳

6个顶级min平均元素postgresql

SQL - GROUP BY和ORDER BY MIN

IN() 子句中的 PostgreSQL ORDER BY 值