如何合并数组中结构的不同模式(用空填充缺失的列)?
Posted
技术标签:
【中文标题】如何合并数组中结构的不同模式(用空填充缺失的列)?【英文标题】:How to merge different schemas of structs in arrays (filling missing columns with null)? 【发布时间】:2021-11-02 15:12:06 【问题描述】:鉴于这些表(a
是一个结构数组)。
baz_v1
(a
是ARRAY<STRUCT<x INT64>>
):
+===========+
| a.x | b |
+===========+
| 1 | one |
| 2 | |
+===========+
baz_v2
(a
是ARRAY<STRUCT<x INT64, y INT64>>
):
+=================+
| a.x | a.z | b |
+=================+
| 3 | 4 | one |
| 5 | 6 | |
+-----------------+
| 7 | 8 | two |
| 9 | 0 | |
+-----------------+
| 11 | 12 | two |
| 13 | 14 | |
+=================+
如何获得以下(连接的)表/视图?
+==================+
| a.x | a.y | b |
+==================+
| 1 | null | one |
| 2 | null | |
+------------------+
| 3 | 4 | one |
| 5 | 6 | |
+------------------+
| 7 | 8 | two |
| 9 | 10 | |
+------------------+
| 11 | 12 | two |
| 13 | 14 | |
+==================+
代码:
WITH `baz_v1` AS (
SELECT
[
STRUCT(1 AS x),
STRUCT(2 AS x)
]
a,
"one" b
), `baz_v2` AS (
SELECT
[
STRUCT(3 AS x, 4 AS y),
STRUCT(5 AS x, 6 AS y)
]
a,
"one" b
UNION ALL
SELECT
[
STRUCT(7 AS x, 8 AS y),
STRUCT(9 AS x, 10 AS y)
]
a,
"two" b
UNION ALL
SELECT
[
STRUCT(11 AS x, 12 AS y),
STRUCT(13 AS x, 14 AS y)
]
a,
"two" b
)
-- todo: Insert magic here, because the below, of course, does not work.
SELECT * FROM baz_v2
UNION ALL
SELECT * FROM baz_v1
【问题讨论】:
一个。你可以在两个表中为b
设置相同的值吗?湾。你可以在同一张表的多行中为“b”设置相同的值吗? C。如果是的话。或 b。如何连接这些?
@MikhailBerlyant 感谢您提出非常好的问题。我刚刚编辑了我的帖子来回答他们。
所以我把它读作 - 是的。 b 没有。 - 正确的? (a. 和 b. 是我上面的 cmets 中的问题)
a) 是的,我们可以在两个表中为 b
设置相同的值。 b) 是的,我们可以在同一张表的多行中为b
设置相同的值。
【参考方案1】:
考虑下面
select * replace(
array(select as struct x, null as y from t.a) as a)
from `baz_v1` t
union all
select * from `baz_v2`
如果应用于我们问题中的样本数据 - 输出是
【讨论】:
感谢您对问题的良好解决。【参考方案2】:在Mikhail Berlyant 给出的very good answer 的基础上,我找到了另一种解决方案,一个不使用SELECT * REPLACE
的解决方案:
SELECT ARRAY(SELECT AS STRUCT x, NULL AS y FROM baz_v1.a) AS a, b FROM baz_v1
UNION ALL
SELECT * FROM baz_v2
【讨论】:
【参考方案3】:以下方法用于合并这些表。
(1) 使用联合所有目标表,数组类型被展平。
(2)为了匹配UNION ALL目标表的列数,列数不足的列数追加
LEFT JOIN (SELECT '' as y) ON FALSE
- reference
(3) ARRAY_AGG、STRUCT、GROUP BY 用于输出数组结果。 - reference
WITH `baz_v1` AS (
SELECT
[
STRUCT(1 AS x),
STRUCT(2 AS x)
] a,
"one" b
), `baz_v2` AS (
SELECT
[
STRUCT(3 AS x, 4 AS y),
STRUCT(5 AS x, 6 AS y)
] a,
"two" b
)
SELECT ARRAY_AGG (
STRUCT(x, y)
) as a,
b
FROM (
SELECT xy.x as x, xy.y as y, b
FROM baz_v2, UNNEST(a) as xy
UNION ALL
SELECT x, y, b
FROM (
SELECT x.x as x, CAST(y as INT64) as y, b
FROM baz_v1, UNNEST(a) as x
LEFT JOIN (SELECT '' as y) ON FALSE
)
)
GROUP BY b
有结果)
【讨论】:
这看起来很有趣,谢谢。我是否理解正确,它假定b
的每个值都是唯一的?
@TobiasHermann。是的。你说的对。真的很抱歉,但是有一个错误,假设 b 在将结果输出为 ARRAY 的部分中是唯一的
谢谢。这部分在我最初的问题中并不清楚(正如 Mikhail Berlyant 指出的那样)。我已经编辑了我的问题以表明 b
不一定是唯一的。以上是关于如何合并数组中结构的不同模式(用空填充缺失的列)?的主要内容,如果未能解决你的问题,请参考以下文章