大查询 - 将数组/json 对象转置为列

Posted

技术标签:

【中文标题】大查询 - 将数组/json 对象转置为列【英文标题】:Big Query - Transpose array/json objects into columns 【发布时间】:2020-10-21 03:52:05 【问题描述】:

这个问题是这两个问题的延续:

    Big Query - Transpose arrays into colums Big Query - Transpose Specific fields into Columns

我们在 Big Query 中有一个如下表。

输入表:

 Name | Question  | Answer
 -----+-----------+-------
 Bob  | Interest  | ["a"]     
 Sue  | Interest  | ["a", "b"]
 Joe  | Interest  | ["b"]
 Joe  | Gender    | Male
 Bob  | Gender    | Female
 Sue  | DOB       | 2020-10-17
 Bob  | Others    |  "country" : "es", "language" : "ca"

注意: Answer 列中的所有值都是字符串化的值,Arrays / JSON 对象是动态的。

我们希望将上表转换为以下格式,使其对 BI/Visualisation 友好。

所需的表:

 +-------------------------------------------------------------+
 | Name | a | b | c | Gender | DOB        | country | language |
 +-------------------------------------------------------------+
 | Bob  | 1 | 0 | 0 | Female | 2020-10-17 |   es    |   ca     |
 | Sue  | 1 | 1 | 0 |   -    |     -      |   -     |   -      |
 | Joe  | 0 | 1 | 0 |  Male  |     -      |   -     |   -      |
 +-------------------------------------------------------------+

【问题讨论】:

您至少自己尝试过一些东西吗?您已经回答了几乎所有问题,只需要一点额外的努力!那你试过了吗?你遇到了什么问题? @mikhail 我可以使用 JSON_EXTRACT 函数提取 JSON 值。但是动态提取它们并将它们转换为单独的列是我卡住的地方。 我明白了。无论如何 - 看看答案! 【参考方案1】:

以下是 BigQuery 标准 SQL

#standardSQL
create temp table data as
select name, question, value as answer 
from `project.dataset.table`, 
unnest(split(translate(answer, '[]" ', ''))) value
where question = 'Interest'
union all
select name, question, answer 
from `project.dataset.table`
where not question in ('Interest', 'Others')
union all
select name, 
  split(value, ':')[offset(0)] as question, 
  split(value, ':')[offset(1)] as answer 
from `project.dataset.table`, 
unnest(split(translate(answer, '" ', ''))) value
where question = 'Others';

EXECUTE IMMEDIATE (
  SELECT """
    SELECT name, """ || STRING_AGG("""MAX(IF(answer = '""" || value || """', 1, 0)) AS """ || value, ', ')   
FROM (
  SELECT DISTINCT answer value FROM data
  WHERE question = 'Interest' ORDER BY value
)) || (
  SELECT ", " || STRING_AGG("""MAX(IF(question = '""" || value || """', answer, '-')) AS """ || value, ', ')   
FROM (
    SELECT DISTINCT question value FROM data
    WHERE question != 'Interest' ORDER BY value
)) || """  
  FROM data 
  GROUP BY name
  """;     

如果适用于您问题中的样本数据

with `project.dataset.table` AS (
  select 'Bob' name, 'Interest' question, '["a"]' answer union all
  select 'Sue', 'Interest', '["a", "b"]' union all
  select 'Joe', 'Interest', '["b"]' union all
  select 'Joe', 'Gender', 'Male' union all
  select 'Bob', 'Gender', 'Female' union all
  select 'Sue', 'DOB', '2020-10-17' union all
  select 'Bob', 'Others', ' "country" : "es", "language" : "ca"' 
)    

输出是

注意:上述脚本的EXECUTE IMMEDIATE 部分与上一篇完全相同——变化仅在于将原始数据准备到临时表data 中,而不是在EXECUTE IMMEDIATE 中使用它

【讨论】:

上面的查询给了我预期的结果。但是,在“其他”情况下(JSON 对象),一些值是空的 JSON 字符串,例如 。因此,偏移量会引发Array index X is out of bounds (overflow) 错误。所以我用 SAFE_OFFSET() 替换了它,现在它工作正常。如何在当前查询中添加条件以忽略空值?

以上是关于大查询 - 将数组/json 对象转置为列的主要内容,如果未能解决你的问题,请参考以下文章

将多行数据列转置为一列

Postgresql 查询将列转置为行

SQL Server JSON 将行转置为列

Oracle:将列转置为行

使用交叉应用将列转置为行

Oracle SQL Developer:如何使用 PIVOT 函数将行转置为列