如何使用标准 SQL 在 BigQuery 中透视表?
Posted
技术标签:
【中文标题】如何使用标准 SQL 在 BigQuery 中透视表?【英文标题】:How to pivot a table in BigQuery using standard SQL? 【发布时间】:2019-01-24 19:25:24 【问题描述】:我正在尝试使用标准 SQL 在 bigQuery 中旋转一个表 - 这是我的输入表 -
with temp as (
select "1" as id , "a" as source
union all
select "1" as id , "b" as source
union all
select "1" as id , "c" as source
union all
select "2" as id , "a" as source
union all
select "2" as id , "b" as source
union all
select "3" as id , "c" as source
union all
select "4" as id , "c" as source
)
select * from temp
我想根据source
列旋转此表,并根据派生源列的每个组合计算records
的数量。我只有 3 个来源 - a,b and c
。
我的输出表应该是 -
source_a, source_b, source_c, records
0,0,0, 0
0,0,1, 2
0,1,0, 0
0,1,1, 0
1,0,0, 0
1,0,1, 0
1,1,0, 1
1,1,1, 1
我尝试过使用 case 语句,但我认为它不起作用 -
with temp as (
select "1" as id , "a" as source
union all
select "1" as id , "b" as source
union all
select "1" as id , "c" as source
union all
select "2" as id , "a" as source
union all
select "2" as id , "b" as source
union all
select "3" as id , "c" as source
union all
select "4" as id , "c" as source
)
select case when source = "a" then 1 else 0 end source_a,
case when source = "b" then 1 else 0 end source_b,
case when source = "c" then 1 else 0 end source_c,
count(*) as records
from temp
group by 1 ,2 ,3
【问题讨论】:
到目前为止你尝试了什么? 我尝试过 case 语句,但它没有捕获所有组合 【参考方案1】:以下示例适用于 BigQuery 标准 SQL
#standardSQL
WITH temp AS (
SELECT "1" AS id , "a" AS source UNION ALL
SELECT "1" AS id , "b" AS source UNION ALL
SELECT "1" AS id , "c" AS source UNION ALL
SELECT "2" AS id , "a" AS source UNION ALL
SELECT "2" AS id , "b" AS source UNION ALL
SELECT "3" AS id , "c" AS source UNION ALL
SELECT "4" AS id , "c" AS source
), vals AS (
SELECT 0 val UNION ALL SELECT 1
), combinations AS (
SELECT v1.val source_a, v2.val source_b, v3.val source_c
FROM vals v1
CROSS JOIN vals v2
CROSS JOIN vals v3
), facts AS (
SELECT id,
MAX(IF(source = 'a', 1, 0)) AS source_a,
MAX(IF(source = 'b', 1, 0)) AS source_b,
MAX(IF(source = 'c', 1, 0)) AS source_c
FROM temp
GROUP BY id
)
SELECT source_a, source_b, source_c, COUNT(id) records
FROM combinations
LEFT JOIN facts
USING (source_a, source_b, source_c)
GROUP BY source_a, source_b, source_c
ORDER BY source_a, source_b, source_c
结果
Row source_a source_b source_c records
1 0 0 0 0
2 0 0 1 2
3 0 1 0 0
4 0 1 1 0
5 1 0 0 0
6 1 0 1 0
7 1 1 0 1
8 1 1 1 1
【讨论】:
谢谢米哈伊尔,你能解释一下using
子句吗?
USING (source_a, source_b, source_c)
等价于 ON facts.source_a = combinations.source_a AND facts.source_b = combinations.source_b AND facts.source_c = combinations.source_c
- 正如您所看到的,使用 USING
可以节省一些类型。但它也解决了歧义 - 如果您使用 ON
子句 - 您将不得不在 SELECT 语句中使用 combinations.source_a, combinations.source_b, combinations.source_c
。所以,再次节省打字和整体看起来更好:o)以上是关于如何使用标准 SQL 在 BigQuery 中透视表?的主要内容,如果未能解决你的问题,请参考以下文章
如何在 BigQuery 标准 SQL 中查询 Bigtable 列值?