使用 group by [Postgres] 时选择常用值

Posted 2023-02-16

技术标签:

【中文标题】使用 group by [Postgres] 时选择常用值【英文标题】：Select common values when using group by [Postgres] 【发布时间】：2020-11-24 16:41:38 【问题描述】：

我有三个主表会议、人员、爱好和两个关系表。

Table meetings
+---------------+
| id | subject  |
+----+----------+
|  1 | Kickoff  |
|  2 | Relaunch |
|  3 | Party    |
+----+----------+

Table persons
+------------+
| id | name  |
+----+-------+
|  1 | John  |
|  2 | Anna  |
|  3 | Linda |
+----+-------+

Table hobbies
+---------------+
| id | name     |
+----+----------+
|  1 | Soccer   |
|  2 | Tennis   |
|  3 | Swimming |
+----+----------+

Relation Table meeting_person
+-----------------+-----------+
| id | meeting_id | person_id |
+----+------------+-----------+
|  1 |          1 |         1 |
|  2 |          1 |         2 |
|  3 |          1 |         3 |
|  4 |          2 |         1 |
|  5 |          2 |         2 |
|  6 |          3 |         1 |
+----+------------+-----------+

Relation Table person_hobby
+----------------+----------+
| id | person_id | hobby_id |
+----+-----------+----------+
|  1 |         1 |        1 |
|  2 |         1 |        2 |
|  3 |         1 |        3 |
|  4 |         2 |        1 |
|  5 |         2 |        2 |
|  6 |         3 |        1 |
+----+-----------+----------+

现在我想找出参加每个会议的所有人的共同爱好。所以想要的结果是：

+------------+-----------------+------------------------+
| meeting_id | persons         | common_hobbies         |
|            | (Aggregated)    | (Aggregated)           |
+------------+-----------------+------------------------+
|          1 | John,Anna,Linda | Soccer                 |
|          2 | John,Anna       | Soccer,Tennis          |
|          3 | John            | Soccer,Tennis,Swimming |
+------------+-----------------+------------------------+

我目前正在进行的工作是：

select
    m.id as "meeting_id", 
    (
        select string_agg(distinct p.name, ',')
        from meeting_person mp
        inner join persons p on mp.person_id = p.id
        where m.id = mp.meeting_id
    ) as "persons",
    string_agg(distinct h2.name , ',') as "common_hobbies"
from meetings m
inner join meeting_person mp2 on m.id = mp2.meeting_id 
inner join persons p2 on mp2.person_id = p2.id
inner join person_hobby ph2 on p2.id = ph2.person_id 
inner join hobbies h2 on ph2.hobby_id = h2.id 
group by m.id

但是这个查询列出的不是 common_hobbies，而是所有至少被提及一次的爱好。

+------------+-----------------+------------------------+
| meeting_id | persons         | common_hobbies         |
+------------+-----------------+------------------------+
|          1 | John,Anna,Linda | Soccer,Tennis,Swimming |
|          2 | John,Anna       | Soccer,Tennis,Swimming |
|          3 | John            | Soccer,Tennis,Swimming |
+------------+-----------------+------------------------+

有没有人对我有任何提示，我该如何解决这个问题？

干杯

【问题讨论】：

提示：颠倒你的做法。从meeting 开始，每个表只加入一次，直到得到类似于meeting.subject, hobby.name、person.name 的结果。此步骤不需要子查询。数据库初始化脚本将不胜感激 @Slava Rozhnev：在我的办公室电脑上写了这篇文章：明天将发布脚本。 @Mike Organek：我知道子查询并不是真正必要的，但我的问题是我的一位使用子查询的同事的简化版本。我试图坚持他开始的方式。对于我的问题，“人员”列并不是真正需要的。还是我误会了你？我给你一个提示如何开始。在单个查询中，将五个表内连接一次，每个表都以 meeting.subject, hobby.name, person.name 作为查询结果。如果这是一个学习练习，那么从声明式而非命令式的角度处理 SQL 问题至关重要。 【参考方案1】：

这个问题可以通过实现自定义聚合函数来解决（found it here）：

create or replace function array_intersect(anyarray, anyarray)
returns anyarray language sql
as $$
    select 
        case 
            when $1 is null then $2
            when $2 is null then $1
            else
                array(
                    select unnest($1)
                    intersect
                    select unnest($2))
        end;
$$;

create aggregate array_intersect_agg (anyarray)
(
    sfunc = array_intersect,
    stype = anyarray
);

所以，解决办法可以是下一个：

select 
    meeting_id, 
    array_agg(ph.name) persons, 
    array_intersect_agg(hobby) common_hobbies
from meeting_person mp
join (
    select p.id, p.name, array_agg(h.name) hobby
    from person_hobby ph
    join persons p on ph.person_id = p.id
    join hobbies h on h.id = ph.hobby_id
    group by p.id, p.name
) ph on ph.id = mp.person_id
group by meeting_id;

看example fiddle

结果：

meeting_id |    persons            | common_hobbies
-----------+-----------------------+--------------------------
1          |    John,Anna,Linda  | Soccer
3          |    John             | Soccer,Tennis,Swimming
2          |    John,Anna        | Soccer,Tennis

【讨论】：

以上是关于使用 group by [Postgres] 时选择常用值的主要内容，如果未能解决你的问题，请参考以下文章