BigQuery:对具有不同字段顺序的重复字段进行联合
Posted
技术标签:
【中文标题】BigQuery:对具有不同字段顺序的重复字段进行联合【英文标题】:BigQuery: Union on repreated fields with different order of fields 【发布时间】:2020-12-15 15:47:12 【问题描述】:如果字段的顺序不匹配,如何使UNION ALL
为重复字段工作?
在下面的例子中我尝试UNION
data_1_nested和data_2_nested,而重复字段nested
有两个字段:id和年龄,但顺序不同。
我可以UNNEST
和 renest,但如果我有超过 1 个需要 UNION
的嵌套字段,这不会很有帮助。
例子:
with
data_1 as (
Select 'a123' as id, 1 as age, 'a' as grade
union all
Select 'a123' as id, 3 as age,'b' as grade
union all
Select 'a123' as id, 4.5 as age,'c' as grade
)
,
data_2 as (
Select 'b456' as id, 6 as age,'e' as grade
union all
Select 'b456' as id, 5 as age,'f' as grade
union all
Select 'b456' as id, 2.5 as age,'g' as grade
)
,
data_1_nested as (
SELECT id,
array_agg(STRUCT(
age,grade
)) as nested
from data_1
group by 1
)
,
data_2_nested as (
SELECT id,
array_agg(STRUCT(
grade, age
)) as nested
from data_2
group by 1
)
SELECT * from data_1_nested
union all
SELECT * from data_2_nested
【问题讨论】:
【参考方案1】:下面应该适合你
select * from data_1_nested
union all
select id, array(select as struct age, grade from t.nested) from data_2_nested t
如果应用于您问题的样本数据 - 输出是
【讨论】:
【参考方案2】:我稍微修改了您的数据以创建 2 个需要联合的嵌套字段。我还添加了一个用于解析 JSON 的 JS 函数。这是一个丑陋的解决方案,但它似乎正在工作。不确定它是否可扩展(必须创建多少个函数来隐藏不同的嵌套字段)。
CREATE TEMP FUNCTION JsonToItems(input STRING)
RETURNS ARRAY<STRUCT<age INT64, grade STRING>>
LANGUAGE js AS """
return JSON.parse(input);
""";
with
data_1 as (
Select 'a123' as id, 1 as age, 'a' as grade
union all
Select 'a123' as id, 3 as age,'b' as grade
union all
Select 'a123' as id, 4.5 as age,'c' as grade
)
,
data_2 as (
Select 'b456' as id, 6 as age,'e' as grade
union all
Select 'b456' as id, 5 as age,'f' as grade
union all
Select 'b456' as id, 2.5 as age,'g' as grade
)
,
data_1_nested as (
SELECT id,
array_agg(STRUCT(
age,grade
)) as nested,
array_agg(STRUCT(
age,grade
)) as nested2
from data_1
group by 1
)
,
data_2_nested as (
SELECT id,
array_agg(STRUCT(
grade, age
)) as nested,
array_agg(STRUCT(
grade, age
)) as nested2
from data_2
group by 1
)
select id, JsonToItems(json), JsonToItems(json2) from (
SELECT id, TO_JSON_STRING(nested) as json, TO_JSON_STRING(nested2) as json2 from data_1_nested
union all
SELECT id, TO_JSON_STRING(nested) as json, TO_JSON_STRING(nested2) as json2 from data_2_nested
);
【讨论】:
Kyrylo,我故意颠倒了 data_2_nested 中的等级和年龄——以显示我实际面临的问题。 data_1_nested 和 data_2_nested 是“给定的”,所有操作都应该从那里开始。以上是关于BigQuery:对具有不同字段顺序的重复字段进行联合的主要内容,如果未能解决你的问题,请参考以下文章
在 BigQuery 中对具有 DateTime 值的字符串字段进行范围查询