具有多个值列的数据透视表
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了具有多个值列的数据透视表相关的知识,希望对你有一定的参考价值。
我有一个Postgres表,其中包含来自不同制造商的产品数据,这里是简化的表结构:
CREATE TABLE test_table (
sku text,
manufacturer_name text,
price double precision,
stock int
);
INSERT INTO test_table
VALUES ('sku1', 'Manufacturer1', 110.00, 22),
('sku1', 'Manufacturer2', 120.00, 15),
('sku1', 'Manufacturer3', 130.00, 1),
('sku1', 'Manufacturer3', 30.00, 11),
('sku2', 'Manufacturer1', 10.00, 2),
('sku2', 'Manufacturer2', 9.00, 3),
('sku3', 'Manufacturer2', 21.00, 3),
('sku3', 'Manufacturer2', 1.00, 7),
('sku3', 'Manufacturer3', 19.00, 5);
我需要为每个sku输出每个制造商,但如果同一个sku有几个相同的制造商,我需要选择价格最低的制造商(注意我还需要包括'stock'栏),这里有理想的结果:
| sku | man1_price | man1_stock | man2_price | man2_stock | man3_price | man3_stock |
|------|------------|------------|------------|------------|------------|------------|
| sku1 | 110.0 | 22 | 120.0 | 15 | 30.0 | 11 |
| sku2 | 10.0 | 2 | 9.0 | 3 | | |
| sku3 | | | 1.0 | 7 | 19.0 | 5 |
我试图使用Postgres crosstab()
:
SELECT *
FROM crosstab('SELECT sku, manufacturer_name, price
FROM test_table
ORDER BY 1,2',
$$ SELECT DISTINCT manufacturer_name FROM test_table ORDER BY 1 $$
)
AS ct (sku text, "man1_price" double precision,
"man2_price" double precision,
"man3_price" double precision
);
但这会生成一个只有一个price
列的表。我没有找到包含stock
列的方法。
我还尝试使用条件聚合:
SELECT sku,
MIN(CASE WHEN manufacturer_name = 'Manufacturer1' THEN price END) as man1_price,
MIN(CASE WHEN manufacturer_name = 'Manufacturer1' THEN stock END) as man1_stock,
MIN(CASE WHEN manufacturer_name = 'Manufacturer2' THEN price END) as man2_price,
MIN(CASE WHEN manufacturer_name = 'Manufacturer2' THEN stock END) as man2_stock,
MIN(CASE WHEN manufacturer_name = 'Manufacturer3' THEN price END) as man3_price,
MIN(CASE WHEN manufacturer_name = 'Manufacturer3' THEN stock END) as man3_stock
FROM test_table
GROUP BY sku
ORDER BY sku
而且这个查询在我的情况下也不起作用 - 它只是选择最小库存水平 - 但是如果相同的制造商有相同的制造商但价格/库存不同 - 这个查询选择一个制造商的最低价格和最小库存来自另一个。
如何从该表输出每个制造商的price
和相应的stock
?
附:谢谢大家的帮助。我的Postgres表格相当小 - 不超过15k的产品,(我不知道这些数字是否对正确的比较有用)但是因为Erwin Brandstetter要求比较不同的查询性能我用EXPLAIN ANALYZE
运行了3个查询,这里是他们的执行时间:
Erwin Brandstetter query: 400 - 450 ms
Kjetil S query: 250 - 300 ms
Gordon Linoff query: 200 - 250 ms
a_horse_with_no_name query: 250 - 300 ms
再次 - 我不确定这些数字是否可用作参考。对于我的情况,我选择了Kjetil S
和Gordon Linoff
查询的组合版本,但Erwin Brandstetter
和a_horse_with_no_name
变体也非常有用和有趣。值得注意的是,如果将来我的桌子最终会有更多的制造商 - 每次调整查询和输入他们的名字都会很烦人 - 因此来自a_horse_with_no_name
的查询答案将是最方便的。
你最后的选择几乎可以。但是你应该添加一个where条件,其中每个制造商的每个sku的非最低价格的行被删除。这会产生您的预期结果:
select
sku,
min( case when manufacturer_name='Manufacturer1' then price end ) man1_price,
min( case when manufacturer_name='Manufacturer1' then stock end ) man1_stock,
min( case when manufacturer_name='Manufacturer2' then price end ) man2_price,
min( case when manufacturer_name='Manufacturer2' then stock end ) man2_stock,
min( case when manufacturer_name='Manufacturer3' then price end ) man3_price,
min( case when manufacturer_name='Manufacturer3' then stock end ) man3_stock
from test_table t
where not exists (
select 1 from test_table
where sku=t.sku
and manufacturer_name=t.manufacturer_name
and price<t.price
)
group by sku
order by 1;
我发现使用复杂的数据透视表这些天使用JSON结果要容易得多。生成单个聚合JSON值不会破坏SQL的固有限制,即在执行查询之前必须知道列数(并且对于所有行必须相同)。
你可以使用这样的东西:
select sku,
jsonb_object_agg(manufacturer_name,
jsonb_build_object('price', price, 'stock', stock, 'isMinPrice', price = min_price)) as price_info
from (
select sku,
manufacturer_name,
price,
min(price) over (partition by sku) as min_price,
stock
from test_table
) t
group by sku;
以上使用您的示例数据返回以下结果:
sku | price_info
-----+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
sku1 | {"Manufacturer1": {"price": 110, "stock": 22, "isMinPrice": false}, "Manufacturer2": {"price": 120, "stock": 15, "isMinPrice": false}, "Manufacturer3": {"price": 30, "stock": 11, "isMinPrice": true}}
sku2 | {"Manufacturer1": {"price": 10, "stock": 2, "isMinPrice": false}, "Manufacturer2": {"price": 9, "stock": 3, "isMinPrice": true}}
sku3 | {"Manufacturer2": {"price": 1, "stock": 7, "isMinPrice": true}, "Manufacturer3": {"price": 19, "stock": 5, "isMinPrice": false}}
我会使用distinct on
将数据限制为一个制造商到一个价格。我喜欢Postgres中的filter
功能。所以:
select sku,
max(price) filter (where manufacturer_name = 'Manufacturer1') as man1_price,
max(stock) filter (where manufacturer_name = 'Manufacturer1') as man1_stock,
max(price) filter (where manufacturer_name = 'Manufacturer2') as man2_price,
max(stock) filter (where manufacturer_name = 'Manufacturer2') as man2_stock,
max(price) filter (where manufacturer_name = 'Manufacturer3') as man3_price,
max(stock) filter (where manufacturer_name = 'Manufacturer3') as man3_stock
from (select distinct on (manufacturer_name, sku) t.*
from test_table t
order by manufacturer_name, sku, price
) t
group by sku
order by sku;
crosstab()
必须提供静态列定义列表。你的第二个参数:
$$ SELECT DISTINCT manufacturer_name FROM test_table ORDER BY 1 $$
...提供需要动态列定义列表的动态值列表。这不会起作用 - 除了发病率。
您的任务的核心问题是crosstab()
在其第一个参数中需要查询中的单个值列。但是你想要每行处理两个值列(price
和stock
)。
解决此问题的一种方法是在复合类型中打包多个值,并在外部SELECT
中提取值。
创建一次复合类型:
CREATE TYPE price_stock AS (price float8, stock int);
临时表或视图也可用于此目的。 然后:
SELECT sku
, (man1).price, (man1).stock
, (man2).price, (man2).stock
, (man3).price, (man3).stock
FROM crosstab(
'SELECT sku, manufacturer_name, (price, stock)::price_stock
FROM test_table
ORDER BY 1,2'
, $$VALUES ('Manufacturer1'),('Manufacturer2'),('Manufacturer3')$$
)
AS ct (sku text
, man1 price_stock
, man2 price_stock
, man3 price_stock
);
对于快速测试,或者如果基础表的行不是太宽,您也可以只使用其行类型,而无需创建自定义类型:
SELECT sku
, (man1).price, (man1).stock
, (man2).price, (man2).stock
, (man3).price, (man3).stock
FROM crosstab(
'SELECT sku, manufacturer_name, t
FROM test_table t
ORDER BY 1,2'
, $$VALUES ('Manufacturer1'),('Manufacturer2'),('Manufacturer3')$$
)
AS ct (sku text
, man1 test_table
, man2 test_table
, man3 test_table
);
db <>小提琴here
有关:
以上是关于具有多个值列的数据透视表的主要内容,如果未能解决你的问题,请参考以下文章