Bigquery:将多列中的数据转换为行格式
Posted
技术标签:
【中文标题】Bigquery:将多列中的数据转换为行格式【英文标题】:Bigquery: Transform data in multiple columns into row-format 【发布时间】:2019-07-19 07:30:18 【问题描述】:假设BQ中有如下表格:
SELECT "Desktop" AS Device, 24 AS col1, 9 AS col2, 28 AS col3, 7 AS col4, 98 AS col5, 77 AS col6, 59 AS col7 UNION ALL
SELECT "Mobile" AS Device, 8 AS col1, 43 AS col2, 75 AS col3, 44 AS col4, 38 AS col5, 31 AS col6, 46 AS col7 UNION ALL
SELECT "Tablet" AS Device, 7 AS col1, 9 AS col2, 34 AS col3, 86 AS col4, 62 AS col5, 69 AS col6, 74 AS col7
因此,表格可以大到大约 100 列。
我想转换这个查询,这样我就有了结果表:
SELECT "Desktop" AS Device, 24 AS Nr UNION ALL
SELECT "Desktop" AS Device, 9 AS Nr UNION ALL
SELECT "Desktop" AS Device, 28 AS Nr UNION ALL
SELECT "Desktop" AS Device, 7 AS Nr UNION ALL
SELECT "Desktop" AS Device, 98 AS Nr UNION ALL
SELECT "Desktop" AS Device, 77 AS Nr UNION ALL
SELECT "Desktop" AS Device, 59 AS Nr UNION ALL
SELECT "Mobile" AS Device, 8 AS Nr UNION ALL
SELECT "Mobile" AS Device, 43 AS Nr UNION ALL
SELECT "Mobile" AS Device, 75 AS Nr UNION ALL
Etc
有人知道如何实现吗?
【问题讨论】:
【参考方案1】:以下是 BigQuery 标准 SQL,这里的额外奢侈是它不依赖于要取消透视的列的数量和名称
#standardSQL
WITH raw AS (
SELECT "Desktop" AS Device, 24 AS col1, 9 AS col2, 28 AS col3, 7 AS col4, 98 AS col5, 77 AS col6, 59 AS col7 UNION ALL
SELECT "Mobile" AS Device, 8 AS col1, 43 AS col2, 75 AS col3, 44 AS col4, 38 AS col5, 31 AS col6, 46 AS col7 UNION ALL
SELECT "Tablet" AS Device, 7 AS col1, 9 AS col2, 34 AS col3, 86 AS col4, 62 AS col5, 69 AS col6, 74 AS col7
)
SELECT Device, Nr FROM raw t,
UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING((SELECT AS STRUCT * EXCEPT(Device) FROM UNNEST([t]))), r'":([^,]*)')) Nr
更新 OP 的评论:我完全忘记在要求中包含列名也应该作为单独的列添加
#standardSQL
SELECT Device, SPLIT(pair, ':')[OFFSET(0)] AS col, SPLIT(pair, ':')[OFFSET(1)] AS Nr
FROM raw t,
UNNEST(SPLIT(REGEXP_REPLACE(TO_JSON_STRING((SELECT AS STRUCT * EXCEPT(Device) FROM UNNEST([t]))), r'["]', ''))) pair
如果现在应用于相同的采样数据结果如下所示
Row Device col Nr
1 Desktop col1 24
2 Desktop col2 9
3 Desktop col3 28
4 Desktop col4 7
5 Desktop col5 98
6 Desktop col6 77
7 Desktop col7 59
8 Mobile col1 8
9 Mobile col2 43
10 Mobile col3 75
11 Mobile col4 44
12 Mobile col5 38
13 Mobile col6 31
14 Mobile col7 46
15 Tablet col1 7
16 Tablet col2 9
17 Tablet col3 34
18 Tablet col4 86
19 Tablet col5 62
20 Tablet col6 69
21 Tablet col7 74
【讨论】:
Thnx Mikhail,在很多列的情况下非常方便。但是,我完全忘记了列名也应该作为单独的列添加的要求,所以我知道例如值 24 与第一行的“col1”匹配。这也可能吗? 那么你为什么接受之前的答案呢?无论如何发布您的新问题或用您真正需要的任何内容更新这个问题,我将分别回答或更新我的答案。同时考虑至少投票 无论如何 - 请在我的回答中查看更新,请不要忘记投票 米哈伊尔,为了回答你的问题,我接受了之前的答案,因为它符合我最初的要求。我的额外要求是后来才出现的,我第一次回复你的代码。我会考虑你的反馈,下次我会发布一个新的帖子。无论如何,感谢更新的代码,这完美无缺。 确定,没问题,明白【参考方案2】:您可以将数字列转换为 ARRAY 并使用 UNNEST:
with raw as (
SELECT "Desktop" AS Device, 24 AS col1, 9 AS col2, 28 AS col3, 7 AS col4, 98 AS col5, 77 AS col6, 59 AS col7 UNION ALL
SELECT "Mobile" AS Device, 8 AS col1, 43 AS col2, 75 AS col3, 44 AS col4, 38 AS col5, 31 AS col6, 46 AS col7 UNION ALL
SELECT "Tablet" AS Device, 7 AS col1, 9 AS col2, 34 AS col3, 86 AS col4, 62 AS col5, 69 AS col6, 74 AS col7
)
select Device, Nr
from raw
left join UNNEST ([col1, col2, col3,col4,col5,col6,col7]) Nr
【讨论】:
以上是关于Bigquery:将多列中的数据转换为行格式的主要内容,如果未能解决你的问题,请参考以下文章