FULL OUTER JOIN 将表与 PostgreSQL 合并
Posted
技术标签:
【中文标题】FULL OUTER JOIN 将表与 PostgreSQL 合并【英文标题】:FULL OUTER JOIN to merge tables with PostgreSQL 【发布时间】:2017-11-17 18:44:00 【问题描述】:在this post, 之后,当我将@Vao Tsun 给出的答案应用于更大的数据集时,我仍然遇到问题,这次由 4 个表而不是上面提到的相关帖子中的 2 个表组成。
这是我的数据集:
-- Table 'brcht' (empty)
insee | annee | nb
-------+--------+-----
-- Table 'cana'
insee | annee | nb
-------+--------+-----
036223 | 2017 | 1
086001 | 2016 | 2
-- Table 'font' (empty)
insee | annee | nb
-------+--------+-----
-- Table 'nr'
insee | annee | nb
-------+--------+-----
036223 | 2013 | 1
036223 | 2014 | 1
086001 | 2013 | 1
086001 | 2014 | 2
086001 | 2015 | 4
086001 | 2016 | 2
这里是查询:
SELECT
COALESCE(brcht.insee, cana.insee, font.insee, nr.insee) AS insee,
COALESCE(brcht.annee, cana.annee, font.annee, nr.annee) AS annee,
COALESCE(brcht.nb,0) AS brcht,
COALESCE(cana.nb,0) AS cana,
COALESCE(font.nb,0) AS font,
COALESCE(nr.nb,0) AS nr,
COALESCE(brcht.nb,0) + COALESCE(cana.nb,0) + COALESCE(font.nb,0) + COALESCE(nr.nb,0) AS total
FROM public.brcht
FULL OUTER JOIN public.cana ON brcht.insee = cana.insee AND brcht.annee = cana.annee
FULL OUTER JOIN public.font ON cana.insee = font.insee AND cana.annee = font.annee
FULL OUTER JOIN public.nr ON font.insee = nr.insee AND font.annee = nr.annee
ORDER BY COALESCE(brcht.insee, cana.insee, font.insee, nr.insee), COALESCE(brcht.annee, cana.annee, font.annee, nr.annee);
在结果中,insee='086001'
仍然有两行而不是一行(见下文)。我需要为每个insee
获取一行,在此示例中,两个2
值应位于同一行,total
列显示4
值。
再次感谢您的帮助!
以下是轻松创建上述表格的 SQL 脚本:
CREATE TABLE public.brcht (insee CHARACTER VARYING(10), annee INTEGER, nb INTEGER);
CREATE TABLE public.cana (insee CHARACTER VARYING(10), annee INTEGER, nb INTEGER);
CREATE TABLE public.font (insee CHARACTER VARYING(10), annee INTEGER, nb INTEGER);
CREATE TABLE public.nr (insee CHARACTER VARYING(10), annee INTEGER, nb INTEGER);
INSERT INTO public.cana (insee, annee, nb) VALUES ('036223', 2017, 1), ('086001', 2016, 2);
INSERT INTO public.nr(insee, annee, nb) VALUES ('036223', 2013, 1), ('036223', 2014, 1), ('086001', 2013, 1), ('086001', 2014, 2), ('086001', 2015, 4), ('086001', 2016, 2);
【问题讨论】:
【参考方案1】:受到其他答案的启发,但可能组织得更好:
SELECT *,
brcht + cana + font + nr AS total
FROM (SELECT insee,
annee,
SUM(Coalesce(brcht.nb, 0)) brcht,
SUM(Coalesce(cana.nb, 0)) cana,
SUM(Coalesce(font.nb, 0)) font,
SUM(Coalesce(nr.nb, 0)) nr
FROM brcht
full outer join cana USING (insee, annee)
full outer join font USING (insee, annee)
full outer join nr USING (insee, annee)
GROUP BY insee,
annee) t
ORDER BY insee,
annee;
给予:
insee | annee | brcht | cana | font | nr | total
--------+-------+-------+------+------+----+-------
036223 | 2013 | 0 | 0 | 0 | 1 | 1
036223 | 2014 | 0 | 0 | 0 | 1 | 1
036223 | 2017 | 0 | 1 | 0 | 0 | 1
086001 | 2013 | 0 | 0 | 0 | 1 | 1
086001 | 2014 | 0 | 0 | 0 | 2 | 2
086001 | 2015 | 0 | 0 | 0 | 4 | 4
086001 | 2016 | 0 | 2 | 0 | 2 | 4
(7 rows)
【讨论】:
非常清楚,谢谢!不知道USING
的连接语句。【参考方案2】:
您需要在您现在使用的查询上对 bigint 列执行 GROUP BY 和 SUM()。
select
insee, annee
, sum(brcht) brcht
, sum(cana) cana
, sum(font) font
, sum(nr) nr
, sum(total) total
from (
SELECT
COALESCE(brcht.insee, cana.insee, font.insee, nr.insee) AS insee,
COALESCE(brcht.annee, cana.annee, font.annee, nr.annee) AS annee,
COALESCE(brcht.nb,0) AS brcht,
COALESCE(cana.nb,0) AS cana,
COALESCE(font.nb,0) AS font,
COALESCE(nr.nb,0) AS nr,
COALESCE(brcht.nb,0) + COALESCE(cana.nb,0) + COALESCE(font.nb,0) + COALESCE(nr.nb,0) AS total
FROM public.brcht
FULL OUTER JOIN public.cana ON brcht.insee = cana.insee AND brcht.annee = cana.annee
FULL OUTER JOIN public.font ON cana.insee = font.insee AND cana.annee = font.annee
FULL OUTER JOIN public.nr ON font.insee = nr.insee AND font.annee = nr.annee
) d
group by
insee, annee
【讨论】:
【参考方案3】:尝试:
t=# SELECT
COALESCE(brcht.insee, cana.insee, font.insee, nr.insee) AS insee,
COALESCE(brcht.annee, cana.annee, font.annee, nr.annee) AS annee,
COALESCE(brcht.nb,0) AS brcht,
COALESCE(cana.nb,0) AS cana,
COALESCE(font.nb,0) AS font,
COALESCE(nr.nb,0) AS nr,
COALESCE(brcht.nb,0) + COALESCE(cana.nb,0) + COALESCE(font.nb,0) + COALESCE(nr.nb,0) AS total
FROM public.brcht
FULL OUTER JOIN public.cana ON brcht.insee = cana.insee AND brcht.annee = cana.annee
FULL OUTER JOIN public.font ON cana.insee = font.insee AND cana.annee = font.annee
FULL OUTER JOIN public.nr ON cana.insee = nr.insee AND cana.annee = nr.annee
ORDER BY COALESCE(brcht.insee, cana.insee, font.insee, nr.insee), COALESCE(brcht.annee, cana.annee, font.annee, nr.annee);
insee | annee | brcht | cana | font | nr | total
--------+-------+-------+------+------+----+-------
036223 | 2013 | 0 | 0 | 0 | 1 | 1
036223 | 2014 | 0 | 0 | 0 | 1 | 1
036223 | 2017 | 0 | 1 | 0 | 0 | 1
086001 | 2013 | 0 | 0 | 0 | 1 | 1
086001 | 2014 | 0 | 0 | 0 | 2 | 2
086001 | 2015 | 0 | 0 | 0 | 4 | 4
086001 | 2016 | 0 | 2 | 0 | 2 | 4
(7 rows)
在您的示例中,您加入nr
对抗font
,而您可能想加入它对抗cana
?..
也请在此处查看: https://www.postgresql.org/docs/current/static/queries-table-expressions.html#QUERIES-JOIN
在没有括号的情况下,JOIN 子句从左到右嵌套
更新
解释逻辑:
尝试select * from public.brcht
,添加其他表一,一
出现“更正确”表中的列,因此当您运行所有四个连接时,您会得到:
t=# select *
FROM public.brcht
FULL OUTER JOIN public.cana ON brcht.insee = cana.insee AND brcht.annee = cana.annee
FULL OUTER JOIN public.font ON cana.insee = font.insee AND cana.annee = font.annee
FULL OUTER JOIN public.nr ON font.insee = nr.insee AND font.annee = nr.annee
t-# ;
insee | annee | nb | insee | annee | nb | insee | annee | nb | insee | annee | nb
-------+-------+----+--------+-------+----+-------+-------+----+--------+-------+----
| | | 036223 | 2017 | 1 | | | | | |
| | | 086001 | 2016 | 2 | | | | | |
| | | | | | | | | 036223 | 2013 | 1
| | | | | | | | | 036223 | 2014 | 1
| | | | | | | | | 086001 | 2013 | 1
| | | | | | | | | 086001 | 2014 | 2
| | | | | | | | | 086001 | 2015 | 4
| | | | | | | | | 086001 | 2016 | 2
(8 rows)
所以第 8 列是 font.annee
(请注意 - 它到处都是 null) - 你用 nr.insee
加入它 - 没有匹配 - 所以完全连接需要前三个表中的所有行加入和 nr
表中的所有行- 你得到 8 行
【讨论】:
你为什么要加入nr
对抗cana
?我不明白加入 4 个表的方式...在我的示例中,我首先加入 brcht
与 cana
,然后加入 cana
与 font
,然后 font
与 nr
。对我来说,这样进行似乎是合乎逻辑的。有没有一种合乎逻辑的方式将表格连接在一起?
@wiltomap 试图解释。请注意,如果您不使用 ()
连接发生从左到右,那么最后一个连接将连接之前在 NULL 列上的整个集合 - 你从 (brcht,cana,font) 和所有来自 nr 获得所有内容(所有 - 因为它们没有共同点用于连接的列上的值)。希望这是有道理的 - 解释不是我最好的技能
好的,我明白了,谢谢!问题是 4 个表的内容会定期更改,因此我无法继续根据此调整连接...我需要一种将表连接在一起的方法,以适应任何表的内容。
然后使用括号嵌套连接,这样每个下一个连接都将在“合并”值上以上是关于FULL OUTER JOIN 将表与 PostgreSQL 合并的主要内容,如果未能解决你的问题,请参考以下文章
两个 INNER JOIN 的 FULL OUTER JOIN
oracle 内连接(inner join)外连接(outer join)全连接(full join)
SQL的JOIN语法解析(inner join, left join, right join, full outer join的区别)