从 Postgres 记录中的行中获取最大值并按多列分组

Posted

技术标签:

【中文标题】从 Postgres 记录中的行中获取最大值并按多列分组【英文标题】:Get the maximum value from rows in Postgres records and group by multiple columns 【发布时间】:2021-04-06 00:06:45 【问题描述】:

我有一张这样的桌子:

p_id |      createdat      | pagetitle | sessionid |      text       | device  | deviceserial
------+---------------------+-----------+-----------+-----------------+---------+--------------
      | 2020-11-27 08:07:39 |           |           | App launch      | android | 636363636890
      | 2020-09-01 08:08:18 |           |           | search          | Android | 636363636890
      | 2020-09-02 08:10:10 |           |           | scan            | Android | 636363636890
      | 2020-09-02 08:12:10 |           |           | destroy         | Android | 636363636890
      | 2020-09-02 08:40:11 |           |           | hi              | ios     | 6625839827
      | 2020-09-02 08:45:11 |           |           | hi              | IOS     | 6625839827
      | 2020-09-02 08:43:10 |           |           | launchComponent | Android | 636363636890
      | 2020-09-02 08:50:11 |           |           | hi              | IOS     | 6625839827
      | 2020-09-02 08:47:10 |           |           | launchComponent | Android | 636363636890
      | 2020-09-02 08:53:11 |           |           | hi              | IOS     | 6625839827
      | 2020-09-02 08:50:10 |           |           | launchComponent | Android | 636363636890
      | 2020-09-02 08:55:11 |           |           | hi              | IOS     | 6625839827
      | 2020-09-02 08:52:10 |           |           | launchComponent | Android | 636363636890
      | 2020-09-02 09:00:11 |           |           | hi              | IOS     | 6625839827
      | 2020-09-02 08:55:10 |           |           | launchComponent | Android | 636363636890
      | 2020-09-02 09:05:11 |           |           | hi              | IOS     | 6625839827
      | 2020-09-02 08:59:10 |           |           | launchComponent | Android | 636363636890
      | 2020-09-02 09:07:11 |           |           | hi              | Android | 6625839827
      | 2020-09-02 09:01:10 |           |           | launchComponent | Android | 636363636890
      | 2020-09-02 09:09:11 |           |           | hi              | IOS     | 6625839827
      | 2020-09-02 09:03:10 |           |           | launchComponent | Android | 636363636890
      | 2020-09-02 09:09:11 |           |           | hi              | Android | 6625839828
      | 2020-09-02 09:03:10 |           |           | launchComponent | IOS     | 636363636891
      | 2020-09-02 09:13:11 |           |           | hi              | Android | 6625839828
      | 2020-09-02 09:06:10 |           |           | launchComponent | IOS     | 636363636891

从这张表中,我想实现这样的目标:

deviceserial | event_count |         hr             device
--------------+-------------+---------------------+---------------------
 6625839828   |           2 | 2020-09-02 09:00:00 |Android
 636363636890 |           8 | 2020-09-02 08:00:00 |Android
 636363636891 |           2 | 2020-09-02 09:00:00 |IOS
 6625839827   |           5 | 2020-09-02 08:00:00 |IOS
 

这是我的步骤:我按设备序列的记录分组,每小时作为小时,设备和计数最大值(事件计数)。

我试过这个查询:

select deviceserial,max(event_count) as event_count,hr,device
from (
    select deviceserial,count(*) as event_count,
        date_trunc('hour', createdat) as hr,device
    from devices  
    group by deviceserial,hr,device
) t
group by deviceserial,hr,device

这是我的结果:

 deviceserial | event_count |         hr            device
--------------+-------------+---------------------+---------------------
636363636890      1          2020-11-27 08:00:00  |        android
636363636891      2          2020-09-02 09:00:00  |        IOS
6625839827        4          2020-09-02 09:00:00  |        IOS
6625839827        5          2020-09-02 08:00:00  |        IOS
636363636890      8          2020-09-02 08:00:00  |       Android
636363636890      1          2020-09-01 08:00:00  |       Android
636363636890      2          2020-09-02 09:00:00  |       Android
6625839828        2          2020-09-02 09:00:00  |       Android

【问题讨论】:

【参考方案1】:

如果我没听错,你可以使用distinct on

select distinct on (deviceserial) 
    deviceserial,
    count(*) as event_count,
    date_trunc('hour', createdat) as hr,
    device
from devices  
group by deviceserial, hr, device
order by deviceserial, event_count desc

这为您提供了每个设备序列发生最多事件的时间/设备。但是请注意,这不能正确处理关系(这只会为每个设备序列提供一行)。如果你想允许***关系,你可以使用 rank() 代替:

select *
from (
    select deviceserial,
        count(*) as event_count,
        date_trunc('hour', createdat) as hr,
        device,
        rank() over(partition by deviceserial order by event_count desc) rn
    from devices  
    group by deviceserial, hr, device
) t
where rn = 1
order by deviceserial

或者,在 Postgres 13 中:

select deviceserial,
    count(*) as event_count,
    date_trunc('hour', createdat) as hr,
    device
from devices  
group by deviceserial, hr, device
order by rank() over(partition by deviceserial order by event_count desc)
fetch first row with ties

【讨论】:

【参考方案2】:

您可以使用窗口函数rank()如下:

select * from
(select deviceserial,count(*) as event_count,
        date_trunc('hour', createdat) as hr, device,
        rank() over (partition by deviceserial order by count(*) desc) as rn
    from devices  
    group by deviceserial,hr,device)
where rn = 1

【讨论】:

以上是关于从 Postgres 记录中的行中获取最大值并按多列分组的主要内容,如果未能解决你的问题,请参考以下文章

获取使用复选框选中的行中某些列的值

在特定列 postgres 中选择具有最大值的组中的行

SQL Server 2012 - 使用 LAG 从以前的行中获取数据

Spark Window 函数:是不是可以直接从使用第一个/最后一个函数找到的行中获取其他值?

如何在javascript中使用从数据库返回的行中的数据

从Excel中的行中提取唯一值