具有多个值列的数据透视表

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了具有多个值列的数据透视表相关的知识,希望对你有一定的参考价值。

我有一个Postgres表,其中包含来自不同制造商的产品数据,这里是简化的表结构:

CREATE TABLE test_table (
  sku               text,
  manufacturer_name text,
  price             double precision,
  stock             int
);

INSERT INTO test_table
VALUES ('sku1', 'Manufacturer1', 110.00, 22),
       ('sku1', 'Manufacturer2', 120.00, 15),
       ('sku1', 'Manufacturer3', 130.00, 1),
       ('sku1', 'Manufacturer3', 30.00, 11),
       ('sku2', 'Manufacturer1', 10.00, 2),
       ('sku2', 'Manufacturer2', 9.00,  3),
       ('sku3', 'Manufacturer2', 21.00, 3),
       ('sku3', 'Manufacturer2', 1.00, 7),
       ('sku3', 'Manufacturer3', 19.00, 5);

我需要为每个sku输出每个制造商,但如果同一个sku有几个相同的制造商,我需要选择价格最低的制造商(注意我还需要包括'stock'栏),这里有理想的结果:

| sku  | man1_price | man1_stock | man2_price | man2_stock | man3_price | man3_stock |
|------|------------|------------|------------|------------|------------|------------|
| sku1 | 110.0      | 22         | 120.0      | 15         | 30.0       | 11         |
| sku2 | 10.0       | 2          | 9.0        | 3          |            |            |
| sku3 |            |            | 1.0        | 7          | 19.0       | 5          |

我试图使用Postgres crosstab()

SELECT *
FROM crosstab('SELECT sku, manufacturer_name, price
              FROM test_table
              ORDER BY 1,2',
              $$ SELECT DISTINCT manufacturer_name FROM test_table ORDER BY 1 $$
       )
       AS ct (sku text, "man1_price" double precision,
              "man2_price" double precision,
              "man3_price" double precision
    );

但这会生成一个只有一个price列的表。我没有找到包含stock列的方法。

我还尝试使用条件聚合:

SELECT sku,
   MIN(CASE WHEN manufacturer_name = 'Manufacturer1' THEN price END) as man1_price,
   MIN(CASE WHEN manufacturer_name = 'Manufacturer1' THEN stock END) as man1_stock,
   MIN(CASE WHEN manufacturer_name = 'Manufacturer2' THEN price END) as man2_price,
   MIN(CASE WHEN manufacturer_name = 'Manufacturer2' THEN stock END) as man2_stock,
   MIN(CASE WHEN manufacturer_name = 'Manufacturer3' THEN price END) as man3_price,
   MIN(CASE WHEN manufacturer_name = 'Manufacturer3' THEN stock END) as man3_stock
FROM test_table
GROUP BY sku
ORDER BY sku

而且这个查询在我的情况下也不起作用 - 它只是选择最小库存水平 - 但是如果相同的制造商有相同的制造商但价格/库存不同 - 这个查询选择一个制造商的最低价格和最小库存来自另一个。

如何从该表输出每个制造商的price和相应的stock

附:谢谢大家的帮助。我的Postgres表格相当小 - 不超过15k的产品,(我不知道这些数字是否对正确的比较有用)但是因为Erwin Brandstetter要求比较不同的查询性能我用EXPLAIN ANALYZE运行了3个查询,这里是他们的执行时间:

Erwin Brandstetter query:        400 - 450 ms 
Kjetil S query:                  250 - 300 ms
Gordon Linoff query:             200 - 250 ms
a_horse_with_no_name query:      250 - 300 ms

再次 - 我不确定这些数字是否可用作参考。对于我的情况,我选择了Kjetil SGordon Linoff查询的组合版本,但Erwin Brandstettera_horse_with_no_name变体也非常有用和有趣。值得注意的是,如果将来我的桌子最终会有更多的制造商 - 每次调整查询和输入他们的名字都会很烦人 - 因此来自a_horse_with_no_name的查询答案将是最方便的。

答案

你最后的选择几乎可以。但是你应该添加一个where条件,其中每个制造商的每个sku的非最低价格的行被删除。这会产生您的预期结果:

select
  sku,
  min( case when manufacturer_name='Manufacturer1' then price end ) man1_price,
  min( case when manufacturer_name='Manufacturer1' then stock end ) man1_stock,
  min( case when manufacturer_name='Manufacturer2' then price end ) man2_price,
  min( case when manufacturer_name='Manufacturer2' then stock end ) man2_stock,
  min( case when manufacturer_name='Manufacturer3' then price end ) man3_price,
  min( case when manufacturer_name='Manufacturer3' then stock end ) man3_stock
from test_table t
where not exists (
    select 1 from test_table
    where sku=t.sku
    and manufacturer_name=t.manufacturer_name
    and price<t.price
)
group by sku
order by 1;
另一答案

我发现使用复杂的数据透视表这些天使用JSON结果要容易得多。生成单个聚合JSON值不会破坏SQL的固有限制,即在执行查询之前必须知道列数(并且对于所有行必须相同)。

你可以使用这样的东西:

select sku, 
       jsonb_object_agg(manufacturer_name, 
                          jsonb_build_object('price', price, 'stock', stock, 'isMinPrice', price = min_price)) as price_info
from (
  select sku, 
         manufacturer_name,
         price, 
         min(price) over (partition by sku) as min_price,
         stock
  from test_table
) t
group by sku;

以上使用您的示例数据返回以下结果:

sku  | price_info                                                                                                                                                                                             
-----+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
sku1 | {"Manufacturer1": {"price": 110, "stock": 22, "isMinPrice": false}, "Manufacturer2": {"price": 120, "stock": 15, "isMinPrice": false}, "Manufacturer3": {"price": 30, "stock": 11, "isMinPrice": true}}
sku2 | {"Manufacturer1": {"price": 10, "stock": 2, "isMinPrice": false}, "Manufacturer2": {"price": 9, "stock": 3, "isMinPrice": true}}                                                                       
sku3 | {"Manufacturer2": {"price": 1, "stock": 7, "isMinPrice": true}, "Manufacturer3": {"price": 19, "stock": 5, "isMinPrice": false}}                                                                       
另一答案

我会使用distinct on将数据限制为一个制造商到一个价格。我喜欢Postgres中的filter功能。所以:

select sku,
       max(price) filter (where manufacturer_name = 'Manufacturer1') as man1_price,
       max(stock) filter (where manufacturer_name = 'Manufacturer1') as man1_stock,
       max(price) filter (where manufacturer_name = 'Manufacturer2') as man2_price,
       max(stock) filter (where manufacturer_name = 'Manufacturer2') as man2_stock,
       max(price) filter (where manufacturer_name = 'Manufacturer3') as man3_price,
       max(stock) filter (where manufacturer_name = 'Manufacturer3') as man3_stock
from (select distinct on (manufacturer_name, sku) t.*
      from test_table t
      order by manufacturer_name, sku, price
     ) t
group by sku
order by sku;
另一答案

crosstab()必须提供静态列定义列表。你的第二个参数:

$$ SELECT DISTINCT manufacturer_name FROM test_table ORDER BY 1 $$

...提供需要动态列定义列表的动态值列表。这不会起作用 - 除了发病率。

您的任务的核心问题是crosstab()在其第一个参数中需要查询中的单个值列。但是你想要每行处理两个值列(pricestock)。

解决此问题的一种方法是在复合类型中打包多个值,并在外部SELECT中提取值。

创建一次复合类型:

CREATE TYPE price_stock AS (price float8, stock int);

临时表或视图也可用于此目的。 然后:

SELECT sku
     , (man1).price, (man1).stock
     , (man2).price, (man2).stock
     , (man3).price, (man3).stock
FROM   crosstab(
   'SELECT sku, manufacturer_name, (price, stock)::price_stock
    FROM   test_table
    ORDER  BY 1,2'
  , $$VALUES ('Manufacturer1'),('Manufacturer2'),('Manufacturer3')$$
    )
       AS ct (sku text
            , man1 price_stock
            , man2 price_stock
            , man3 price_stock
    );

对于快速测试,或者如果基础表的行不是太宽,您也可以只使用其行类型,而无需创建自定义类型:

SELECT sku
     , (man1).price, (man1).stock
     , (man2).price, (man2).stock
     , (man3).price, (man3).stock
FROM   crosstab(
   'SELECT sku, manufacturer_name, t
    FROM   test_table t
    ORDER  BY 1,2'
  , $$VALUES ('Manufacturer1'),('Manufacturer2'),('Manufacturer3')$$
    )
       AS ct (sku text
            , man1 test_table
            , man2 test_table
            , man3 test_table
    );

db <>小提琴here

有关:

以上是关于具有多个值列的数据透视表的主要内容,如果未能解决你的问题,请参考以下文章

具有匹配索引列的多个表的数据框连接值列

如何创建具有一个索引键列和多个值列的字典

Bigquery 表嵌套多值列在查询时出错

具有附加列的数据透视表

具有动态列的 MySQL 数据透视表查询

具有动态列的 MySQL 数据透视表查询