BigQuery:对具有不同字段顺序的重复字段进行联合

Posted

技术标签:

【中文标题】BigQuery:对具有不同字段顺序的重复字段进行联合【英文标题】:BigQuery: Union on repreated fields with different order of fields 【发布时间】:2020-12-15 15:47:12 【问题描述】:

如果字段的顺序不匹配,如何使UNION ALL 为重复字段工作?

在下面的例子中我尝试UNIONdata_1_nesteddata_2_nested,而重复字段nested有两个字段:id年龄,但顺序不同。

我可以UNNEST 和 renest,但如果我有超过 1 个需要 UNION 的嵌套字段,这不会很有帮助。

例子:

with 
data_1 as (
Select 'a123' as id, 1 as age, 'a' as grade
union all 
Select 'a123' as id, 3 as age,'b' as grade
union all 
Select 'a123' as id, 4.5 as age,'c' as grade
)
,
data_2 as (
Select 'b456' as id, 6 as age,'e' as grade
union all 
Select 'b456' as id, 5 as age,'f' as grade
union all 
Select 'b456' as id, 2.5 as age,'g' as grade
)
,
data_1_nested as (
SELECT id, 
       array_agg(STRUCT(
                      age,grade
                        ))  as nested
from data_1                      
group by 1
)
,
data_2_nested as (
SELECT id, 
       array_agg(STRUCT(
                      grade, age
                        ))  as nested
from data_2                      
group by 1
)


SELECT * from data_1_nested
union all 
SELECT * from data_2_nested

【问题讨论】:

【参考方案1】:

下面应该适合你

select * from data_1_nested
union all 
select id, array(select as struct age, grade from t.nested) from data_2_nested t   

如果应用于您问题的样本数据 - 输出是

【讨论】:

【参考方案2】:

我稍微修改了您的数据以创建 2 个需要联合的嵌套字段。我还添加了一个用于解析 JSON 的 JS 函数。这是一个丑陋的解决方案,但它似乎正在工作。不确定它是否可扩展(必须创建多少个函数来隐藏不同的嵌套字段)。

CREATE TEMP FUNCTION JsonToItems(input STRING)
RETURNS ARRAY<STRUCT<age INT64, grade STRING>>
LANGUAGE js AS """
return JSON.parse(input);
""";

with
data_1 as (
Select 'a123' as id, 1 as age, 'a' as grade
union all
Select 'a123' as id, 3 as age,'b' as grade
union all
Select 'a123' as id, 4.5 as age,'c' as grade
)
,
data_2 as (
Select 'b456' as id, 6 as age,'e' as grade
union all
Select 'b456' as id, 5 as age,'f' as grade
union all
Select 'b456' as id, 2.5 as age,'g' as grade
)
,
data_1_nested as (
SELECT id,
       array_agg(STRUCT(
                      age,grade
                        ))  as nested,
       array_agg(STRUCT(
                      age,grade
                        ))  as nested2
from data_1
group by 1
)
,
data_2_nested as (
SELECT id,
       array_agg(STRUCT(
                      grade, age
                        ))  as nested,
       array_agg(STRUCT(
              grade, age
                ))  as nested2
from data_2
group by 1
)

select id, JsonToItems(json), JsonToItems(json2)  from (
    SELECT id, TO_JSON_STRING(nested) as json, TO_JSON_STRING(nested2) as json2 from data_1_nested
    union all
    SELECT id, TO_JSON_STRING(nested) as json, TO_JSON_STRING(nested2) as json2 from data_2_nested
  );

【讨论】:

Kyrylo,我故意颠倒了 data_2_nested 中的等级和年龄——以显示我实际面临的问题。 data_1_nested 和 data_2_nested 是“给定的”,所有操作都应该从那里开始。

以上是关于BigQuery:对具有不同字段顺序的重复字段进行联合的主要内容,如果未能解决你的问题,请参考以下文章

在 BigQuery 中对具有 DateTime 值的字符串字段进行范围查询

使用 Google 表格作为具有重复字段的 BigQuery 数据源

BigQuery:聚合到不同的重复字段

对具有相同单词但顺序不同的字符串进行分组

按最近日期加入 BigQuery 中具有重复记录的表

BigQuery:使用交叉引用查询重复字段