使用 group by [Postgres] 时选择常用值
Posted
技术标签:
【中文标题】使用 group by [Postgres] 时选择常用值【英文标题】:Select common values when using group by [Postgres] 【发布时间】:2020-11-24 16:41:38 【问题描述】:我有三个主表会议、人员、爱好和两个关系表。
Table meetings
+---------------+
| id | subject |
+----+----------+
| 1 | Kickoff |
| 2 | Relaunch |
| 3 | Party |
+----+----------+
Table persons
+------------+
| id | name |
+----+-------+
| 1 | John |
| 2 | Anna |
| 3 | Linda |
+----+-------+
Table hobbies
+---------------+
| id | name |
+----+----------+
| 1 | Soccer |
| 2 | Tennis |
| 3 | Swimming |
+----+----------+
Relation Table meeting_person
+-----------------+-----------+
| id | meeting_id | person_id |
+----+------------+-----------+
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 3 | 1 | 3 |
| 4 | 2 | 1 |
| 5 | 2 | 2 |
| 6 | 3 | 1 |
+----+------------+-----------+
Relation Table person_hobby
+----------------+----------+
| id | person_id | hobby_id |
+----+-----------+----------+
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 3 | 1 | 3 |
| 4 | 2 | 1 |
| 5 | 2 | 2 |
| 6 | 3 | 1 |
+----+-----------+----------+
现在我想找出参加每个会议的所有人的共同爱好。 所以想要的结果是:
+------------+-----------------+------------------------+
| meeting_id | persons | common_hobbies |
| | (Aggregated) | (Aggregated) |
+------------+-----------------+------------------------+
| 1 | John,Anna,Linda | Soccer |
| 2 | John,Anna | Soccer,Tennis |
| 3 | John | Soccer,Tennis,Swimming |
+------------+-----------------+------------------------+
我目前正在进行的工作是:
select
m.id as "meeting_id",
(
select string_agg(distinct p.name, ',')
from meeting_person mp
inner join persons p on mp.person_id = p.id
where m.id = mp.meeting_id
) as "persons",
string_agg(distinct h2.name , ',') as "common_hobbies"
from meetings m
inner join meeting_person mp2 on m.id = mp2.meeting_id
inner join persons p2 on mp2.person_id = p2.id
inner join person_hobby ph2 on p2.id = ph2.person_id
inner join hobbies h2 on ph2.hobby_id = h2.id
group by m.id
但是这个查询列出的不是 common_hobbies,而是所有至少被提及一次的爱好。
+------------+-----------------+------------------------+
| meeting_id | persons | common_hobbies |
+------------+-----------------+------------------------+
| 1 | John,Anna,Linda | Soccer,Tennis,Swimming |
| 2 | John,Anna | Soccer,Tennis,Swimming |
| 3 | John | Soccer,Tennis,Swimming |
+------------+-----------------+------------------------+
有没有人对我有任何提示,我该如何解决这个问题?
干杯
【问题讨论】:
提示:颠倒你的做法。从meeting
开始,每个表只加入一次,直到得到类似于meeting.subject, hobby.name
、person.name
的结果。此步骤不需要子查询。
数据库初始化脚本将不胜感激
@Slava Rozhnev:在我的办公室电脑上写了这篇文章:明天将发布脚本。
@Mike Organek:我知道子查询并不是真正必要的,但我的问题是我的一位使用子查询的同事的简化版本。我试图坚持他开始的方式。对于我的问题,“人员”列并不是真正需要的。还是我误会了你?
我给你一个提示如何开始。在单个查询中,将五个表内连接一次,每个表都以 meeting.subject, hobby.name, person.name
作为查询结果。如果这是一个学习练习,那么从声明式而非命令式的角度处理 SQL 问题至关重要。
【参考方案1】:
这个问题可以通过实现自定义聚合函数来解决(found it here):
create or replace function array_intersect(anyarray, anyarray)
returns anyarray language sql
as $$
select
case
when $1 is null then $2
when $2 is null then $1
else
array(
select unnest($1)
intersect
select unnest($2))
end;
$$;
create aggregate array_intersect_agg (anyarray)
(
sfunc = array_intersect,
stype = anyarray
);
所以,解决办法可以是下一个:
select
meeting_id,
array_agg(ph.name) persons,
array_intersect_agg(hobby) common_hobbies
from meeting_person mp
join (
select p.id, p.name, array_agg(h.name) hobby
from person_hobby ph
join persons p on ph.person_id = p.id
join hobbies h on h.id = ph.hobby_id
group by p.id, p.name
) ph on ph.id = mp.person_id
group by meeting_id;
看example fiddle
结果:
meeting_id | persons | common_hobbies
-----------+-----------------------+--------------------------
1 | John,Anna,Linda | Soccer
3 | John | Soccer,Tennis,Swimming
2 | John,Anna | Soccer,Tennis
【讨论】:
以上是关于使用 group by [Postgres] 时选择常用值的主要内容,如果未能解决你的问题,请参考以下文章
为每个用户选择最新条目而不使用 group by (postgres)
Postgres - 使用 CTE 的 id 列的唯一值,与 GROUP BY 一起加入
sql 您可以使用group by function连接postgres中的相同列,如下例所示