从 Postgres 记录中的行中获取最大值并按多列分组
Posted
技术标签:
【中文标题】从 Postgres 记录中的行中获取最大值并按多列分组【英文标题】:Get the maximum value from rows in Postgres records and group by multiple columns 【发布时间】:2021-04-06 00:06:45 【问题描述】:我有一张这样的桌子:
p_id | createdat | pagetitle | sessionid | text | device | deviceserial
------+---------------------+-----------+-----------+-----------------+---------+--------------
| 2020-11-27 08:07:39 | | | App launch | android | 636363636890
| 2020-09-01 08:08:18 | | | search | Android | 636363636890
| 2020-09-02 08:10:10 | | | scan | Android | 636363636890
| 2020-09-02 08:12:10 | | | destroy | Android | 636363636890
| 2020-09-02 08:40:11 | | | hi | ios | 6625839827
| 2020-09-02 08:45:11 | | | hi | IOS | 6625839827
| 2020-09-02 08:43:10 | | | launchComponent | Android | 636363636890
| 2020-09-02 08:50:11 | | | hi | IOS | 6625839827
| 2020-09-02 08:47:10 | | | launchComponent | Android | 636363636890
| 2020-09-02 08:53:11 | | | hi | IOS | 6625839827
| 2020-09-02 08:50:10 | | | launchComponent | Android | 636363636890
| 2020-09-02 08:55:11 | | | hi | IOS | 6625839827
| 2020-09-02 08:52:10 | | | launchComponent | Android | 636363636890
| 2020-09-02 09:00:11 | | | hi | IOS | 6625839827
| 2020-09-02 08:55:10 | | | launchComponent | Android | 636363636890
| 2020-09-02 09:05:11 | | | hi | IOS | 6625839827
| 2020-09-02 08:59:10 | | | launchComponent | Android | 636363636890
| 2020-09-02 09:07:11 | | | hi | Android | 6625839827
| 2020-09-02 09:01:10 | | | launchComponent | Android | 636363636890
| 2020-09-02 09:09:11 | | | hi | IOS | 6625839827
| 2020-09-02 09:03:10 | | | launchComponent | Android | 636363636890
| 2020-09-02 09:09:11 | | | hi | Android | 6625839828
| 2020-09-02 09:03:10 | | | launchComponent | IOS | 636363636891
| 2020-09-02 09:13:11 | | | hi | Android | 6625839828
| 2020-09-02 09:06:10 | | | launchComponent | IOS | 636363636891
从这张表中,我想实现这样的目标:
deviceserial | event_count | hr device
--------------+-------------+---------------------+---------------------
6625839828 | 2 | 2020-09-02 09:00:00 |Android
636363636890 | 8 | 2020-09-02 08:00:00 |Android
636363636891 | 2 | 2020-09-02 09:00:00 |IOS
6625839827 | 5 | 2020-09-02 08:00:00 |IOS
这是我的步骤:我按设备序列的记录分组,每小时作为小时,设备和计数最大值(事件计数)。
我试过这个查询:
select deviceserial,max(event_count) as event_count,hr,device
from (
select deviceserial,count(*) as event_count,
date_trunc('hour', createdat) as hr,device
from devices
group by deviceserial,hr,device
) t
group by deviceserial,hr,device
这是我的结果:
deviceserial | event_count | hr device
--------------+-------------+---------------------+---------------------
636363636890 1 2020-11-27 08:00:00 | android
636363636891 2 2020-09-02 09:00:00 | IOS
6625839827 4 2020-09-02 09:00:00 | IOS
6625839827 5 2020-09-02 08:00:00 | IOS
636363636890 8 2020-09-02 08:00:00 | Android
636363636890 1 2020-09-01 08:00:00 | Android
636363636890 2 2020-09-02 09:00:00 | Android
6625839828 2 2020-09-02 09:00:00 | Android
【问题讨论】:
【参考方案1】:如果我没听错,你可以使用distinct on
:
select distinct on (deviceserial)
deviceserial,
count(*) as event_count,
date_trunc('hour', createdat) as hr,
device
from devices
group by deviceserial, hr, device
order by deviceserial, event_count desc
这为您提供了每个设备序列发生最多事件的时间/设备。但是请注意,这不能正确处理关系(这只会为每个设备序列提供一行)。如果你想允许***关系,你可以使用 rank()
代替:
select *
from (
select deviceserial,
count(*) as event_count,
date_trunc('hour', createdat) as hr,
device,
rank() over(partition by deviceserial order by event_count desc) rn
from devices
group by deviceserial, hr, device
) t
where rn = 1
order by deviceserial
或者,在 Postgres 13 中:
select deviceserial,
count(*) as event_count,
date_trunc('hour', createdat) as hr,
device
from devices
group by deviceserial, hr, device
order by rank() over(partition by deviceserial order by event_count desc)
fetch first row with ties
【讨论】:
【参考方案2】:您可以使用窗口函数rank()
如下:
select * from
(select deviceserial,count(*) as event_count,
date_trunc('hour', createdat) as hr, device,
rank() over (partition by deviceserial order by count(*) desc) as rn
from devices
group by deviceserial,hr,device)
where rn = 1
【讨论】:
以上是关于从 Postgres 记录中的行中获取最大值并按多列分组的主要内容,如果未能解决你的问题,请参考以下文章
SQL Server 2012 - 使用 LAG 从以前的行中获取数据