如何使用标准 SQL 在 BigQuery 中透视表?

Posted

技术标签:

【中文标题】如何使用标准 SQL 在 BigQuery 中透视表?【英文标题】:How to pivot a table in BigQuery using standard SQL? 【发布时间】:2019-01-24 19:25:24 【问题描述】:

我正在尝试使用标准 SQL 在 bigQuery 中旋转一个表 - 这是我的输入表 -

with temp as (
select "1" as id , "a" as source
union all
select "1" as id , "b" as source
union all
select "1" as id , "c" as source
union all
select "2" as id , "a" as source
union all
select "2" as id , "b" as source
union all
select "3" as id , "c" as source
union all
select "4" as id , "c" as source
)
select * from temp

我想根据source 列旋转此表,并根据派生源列的每个组合计算records 的数量。我只有 3 个来源 - a,b and c

我的输出表应该是 -

source_a, source_b, source_c, records
0,0,0, 0
0,0,1, 2
0,1,0, 0
0,1,1, 0
1,0,0, 0
1,0,1, 0
1,1,0, 1
1,1,1, 1

我尝试过使用 case 语句,但我认为它不起作用 -

with temp as (
select "1" as id , "a" as source
union all
select "1" as id , "b" as source
union all
select "1" as id , "c" as source
union all
select "2" as id , "a" as source
union all
select "2" as id , "b" as source
union all
select "3" as id , "c" as source
union all
select "4" as id , "c" as source
)
select case when source = "a" then 1 else 0 end source_a,
case when source = "b" then 1 else 0 end source_b,
case when source = "c" then 1 else 0 end source_c,
count(*) as records
from temp
group by 1 ,2 ,3 

【问题讨论】:

到目前为止你尝试了什么? 我尝试过 case 语句,但它没有捕获所有组合 【参考方案1】:

以下示例适用于 BigQuery 标准 SQL

#standardSQL
WITH temp AS (
  SELECT "1" AS id , "a" AS source UNION ALL
  SELECT "1" AS id , "b" AS source UNION ALL
  SELECT "1" AS id , "c" AS source UNION ALL
  SELECT "2" AS id , "a" AS source UNION ALL
  SELECT "2" AS id , "b" AS source UNION ALL
  SELECT "3" AS id , "c" AS source UNION ALL
  SELECT "4" AS id , "c" AS source
), vals AS (
  SELECT 0 val UNION ALL SELECT 1
), combinations AS (
  SELECT v1.val source_a, v2.val source_b, v3.val source_c
  FROM vals v1
  CROSS JOIN vals v2
  CROSS JOIN vals v3
), facts AS (
  SELECT id,
    MAX(IF(source = 'a', 1, 0)) AS source_a,
    MAX(IF(source = 'b', 1, 0)) AS source_b,
    MAX(IF(source = 'c', 1, 0)) AS source_c
  FROM temp
  GROUP BY id
)
SELECT source_a, source_b, source_c, COUNT(id) records
FROM combinations
LEFT JOIN facts
USING (source_a, source_b, source_c)
GROUP BY source_a, source_b, source_c
ORDER BY source_a, source_b, source_c   

结果

Row     source_a    source_b    source_c    records  
1       0           0           0           0    
2       0           0           1           2    
3       0           1           0           0    
4       0           1           1           0    
5       1           0           0           0    
6       1           0           1           0    
7       1           1           0           1    
8       1           1           1           1    

【讨论】:

谢谢米哈伊尔,你能解释一下using 子句吗? USING (source_a, source_b, source_c) 等价于 ON facts.source_a = combinations.source_a AND facts.source_b = combinations.source_b AND facts.source_c = combinations.source_c - 正如您所看到的,使用 USING 可以节省一些类型。但它也解决了歧义 - 如果您使用 ON 子句 - 您将不得不在 SELECT 语句中使用 combinations.source_a, combinations.source_b, combinations.source_c。所以,再次节省打字和整体看起来更好:o)

以上是关于如何使用标准 SQL 在 BigQuery 中透视表?的主要内容,如果未能解决你的问题,请参考以下文章

BigQuery:如何在 C# 中启用标准 SQL

BigQuery 标准 SQL 如何将行转换为列

如何在 BigQuery 标准 SQL 中查询 Bigtable 列值?

Google BigQuery 中的多级数据透视

如何在 BigQuery 中使用标准 SQL 查询 GA RealtimeView?

如何在 bigquery 中旋转我的 sql 表?