PostgreSQL - 按两列分组并使用一列作为结果列

Posted

技术标签:

【中文标题】PostgreSQL - 按两列分组并使用一列作为结果列【英文标题】:PostgreSQL - Group By Two Columns And Use One As Column For Result 【发布时间】:2021-11-25 07:24:44 【问题描述】:

我有两个表:SubjectJournal 如下:

Subject
 id | name
----------
  1 | fruit
  2 | drink
  3 | vege
  4 | fish

Journal
 id | subj | reference | value
------------------------------
  1 |    1 |       foo |    30
  2 |    2 |       bar |    20
  3 |    1 |       bar |    35
  4 |    1 |       bar |    10
  5 |    2 |       baz |    25
  6 |    4 |       foo |    30
  7 |    4 |       bar |    40
  8 |    1 |       baz |    20
  9 |    2 |       bar |     5

我想将Journal.valuesubjreference 相加。

我知道group by 子句就是为此目的,但我希望输出如下:

reference | subj_1 | subj_2 | subj_3 | subj_4
          |  fruit |  drink |   vege |   fish (even better)
---------------------------------------------
      foo |     30 |        |        |     30
      bar |     45 |     25 |        |     40
      baz |     20 |     25 |        |

这可能吗?

【问题讨论】:

是的,请查看此链接:***.com/questions/20618323/… 您正在寻找的东西称为“数据透视表”。 Crosstab or pivot questions 非常感谢 Mikhail Aksenov 和 a_horse_with_no_name!这正是我正在寻找的 【参考方案1】:

可以根据当前数据生成Sql Statement。

然后使用生成的 Sql 语句

样本数据:

create table Subject (
 id serial primary key, 
 name varchar(30) not null
 );

insert into Subject (id, name) values
 (1 ,'fruit')
,(2 ,'drink')
,(3 ,'vege')
,(4 ,'fish');
 
create table Journal (
 id int, 
 subj int, 
 reference varchar(30), 
 value int
);

insert into Journal   
(id, subj, reference, value) values
 (1, 1, 'foo', 30)
,(2, 2, 'bar', 20)
,(3, 1, 'bar', 35)
,(4, 1, 'bar' ,10)
,(5, 2, 'baz', 25)
,(6, 4, 'foo', 30)
,(7, 4, 'bar', 40)
,(8, 1, 'baz', 20)
,(9, 2, 'bar', 5);

生成语句:

SELECT $f$SELECT * FROM crosstab(
     $$SELECT DISTINCT ON (1, 2)
       j.reference, 'subj_'||j.subj||'_'||s.name AS data_type, SUM(j.value) AS val
       FROM Journal j
       JOIN Subject s ON s.id = j.subj
       GROUP BY j.reference, j.subj, s.name
       ORDER BY j.reference$$

    ,$$VALUES ($f$     || string_agg(quote_literal(data_type), '), (') || $f$)$$)
AS x (reference text, $f$ || string_agg(quote_ident(data_type), ' int, ') || ' int)'
AS Stmt
FROM  (SELECT concat('subj_', id, '_', name) AS data_type FROM Subject) x
| stmt | | :------------------------------------------------ -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- ------------------------------ | | SELECT * FROM crosstab( $$SELECT DISTINCT ON (1, 2) j.reference, 'subj_'||j.subj||'_'||s.name AS data_type, SUM(j. value) AS val FROM Journal j JOIN Subject s ON s.id = j.subj GROUP BY j.reference, j.subj, s.name ORDER BY j.reference$$ ,$$VALUES ('subj_1_fruit'), ('subj_2_drink'), ('subj_3_vege'), ('subj_4_fish')$$)AS x (参考文本, subj_1_fruit int, subj_2_drink int , subj_3_vege int, subj_4_fish int) |

运行它

SELECT * FROM crosstab(
     $$SELECT DISTINCT ON (1, 2)
       j.reference, 'subj_'||j.subj||'_'||s.name AS data_type, SUM(j.value) AS val
       FROM Journal j
       JOIN Subject s ON s.id = j.subj
       GROUP BY j.reference, j.subj, s.name
       ORDER BY j.reference$$

    ,$$VALUES ('subj_1_fruit'), ('subj_2_drink'), ('subj_3_vege'), ('subj_4_fish')$$)
AS x (reference text, subj_1_fruit int, subj_2_drink int, subj_3_vege int, subj_4_fish int)
参考 | subj_1_fruit | subj_2_drink | subj_3_vege | subj_4_fish :-------- | ------------: | ------------: | ----------: | ----------: 酒吧 | 45 | 25 | | 40 巴兹 | 20 | 25 | | 富 | 30 | | | 30

db小提琴here

【讨论】:

【参考方案2】:

这会产生您想要的结果:

SELECT *
FROM   crosstab(
   'SELECT reference, subj, sum(value)
    FROM   journal
    GROUP  BY 1, 2
    ORDER  BY 1, 2'

  , $$VALUES (1), (2), (3), (4)$$
   ) AS ct (reference text, fruit int, drink int, vege int, fish int);

db小提琴here

除了排序顺序,看起来很随意?

详细解释和说明:

PostgreSQL Crosstab Query

【讨论】:

以上是关于PostgreSQL - 按两列分组并使用一列作为结果列的主要内容,如果未能解决你的问题,请参考以下文章

按两列分组,其中一列是时间戳

Pandas 数据框:按两列分组,然后对另一列进行平均

在按两列分组时选择最大值,并在另一列上排序

Pandas:按两列分组,将第一列组中的第一个值相加

按两列排序,为啥不先分组呢?

如何获得 PostgreSQL 中的两个平均值之间的差异,平均值在列上,最终表按两列分组?