如何在bigquery中连接两个结构数组?
Posted
技术标签:
【中文标题】如何在bigquery中连接两个结构数组?【英文标题】:How to concatinate two arrays of Struct in bigquery? 【发布时间】:2017-04-22 06:26:37 【问题描述】:我试图在我的查询中连接两个结构数组并不断收到签名错误。这两个结构是相同的(结构中的字段在类型和数量上匹配)。
select order_id, case when h.filled is not null and rf.new is not null then array_concat( h.filled, rf.new) else null end filled_and_new from....
它给出了错误:
Error: No matching signature for function ARRAY_CONCAT for argument types: ARRAY<STRUCT<r_id STRING, s_id STRING, b_id STRING, ...>>, ARRAY<STRUCT<r_id STRING, s_id STRING, b_id STRING, ...>>. Supported signature: ARRAY_CONCAT(ARRAY, [ARRAY, ...]) at [10:18]
是不是说array_concat不能合并两个Structs数组(布局完全相同)?
谢谢
以下是两个数组的定义:
reservations_filled RECORD REPEATED
reservations_filled.reservation_id STRING NULLABLE
reservations_filled.s1_order_id STRING NULLABLE
reservations_filled.s2_order_id STRING NULLABLE
reservations_filled.amount INTEGER NULLABLE
reservations_filled.created_time TIMESTAMP NULLABLE
reservations_filled.updated_time TIMESTAMP NULLABLE
reservations_filled.state STRING NULLABLE
reservations_filled.rate FLOAT NULLABLE
reservations_filled.u_amount INTEGER NULLABLE
reservations_filled.u_fees INTEGER NULLABLE
以及连接表中的数组:
rsrvtn_array RECORD REPEATED
rsrvtn_array.reservation_id STRING NULLABLE
rsrvtn_array.s1_order_id STRING NULLABLE
rsrvtn_array.s2_order_id STRING NULLABLE
rsrvtn_array.amount INTEGER NULLABLE
rsrvtn_array.created TIMESTAMP NULLABLE
rsrvtn_array.updated TIMESTAMP NULLABLE
rsrvtn_array.state STRING NULLABLE
rsrvtn_array.rate FLOAT NULLABLE
rsrvtn_array.u_amount INTEGER NULLABLE
rsrvtn_array.u_fees INTEGER NULLABLE
查询是:
select t1.rsrvtn_array a, t2.reservations_filled b , array_concat(t1.rsrvtn_array, t2.reservations_filled) c from temp.new_orders t1 join temp.order_history t2 on using(order_id)
【问题讨论】:
在 UI 中查看表的架构,因为 STRUCT 中的字段必须不同。错误消息会截断类型名称以避免过于冗长。在您的问题中包含两个 STRUCT 的字段会很有帮助。 【参考方案1】:这是否意味着 array_concat 不能合并两个结构数组(具有相同的精确布局)?
ARRAY_CONCAT 将合并两个具有相同架构的 STRUCT 数组! 见下面的例子/证明
#standardSQL
with data AS (
SELECT
ARRAY<STRUCT<r_id STRING, s_id STRING, b_id STRING>>[('r1', 's1', 'b1')] AS x1,
ARRAY<STRUCT<r_id STRING, s_id STRING, b_id STRING>>[('r2', 's2', 'b2'), ('r3', 's3', 'b3')] AS x2
UNION ALL
SELECT
ARRAY<STRUCT<r_id STRING, s_id STRING, b_id STRING>>[('r5', 's5', 'b5')] AS x1,
NULL AS x2
)
SELECT ARRAY_CONCAT(x1, x2) AS y
FROM data
因此,很可能您的两个数组中的架构实际上是不同的 - 在这种情况下,错误消息将如您所见 - 请参阅下面的示例以了解此类情况
#standardSQL
WITH data1 AS (
SELECT 1 AS id, ARRAY<STRUCT<r_id STRING, s_id STRING, b_id STRING, c_id STRING>>
[('r1', 's1', 'b1', 'c1')] AS x1
UNION ALL
SELECT 2 AS id, ARRAY<STRUCT<r_id STRING, s_id STRING, b_id STRING, c_id STRING>>
[('r5', 's5', 'b5', 'c5')] AS x1
),
data2 AS (
SELECT 1 AS id, ARRAY<STRUCT<r_id STRING, s_id STRING, b_id STRING, cc_id STRING>>
[('r2', 's2', 'b2', 'c2'), ('r3', 's3', 'b3', 'c3')] AS x2
UNION ALL
SELECT 2 AS id, NULL AS x2
)
SELECT data1.id, ARRAY_CONCAT(data1.x1, data2.x2) AS y
FROM data1
JOIN data2
ON data1.id = data2.id
这里的错误与您在示例中看到的完全一样
Error: NO matching signature FOR FUNCTION ARRAY_CONCAT FOR argument types:
ARRAY<STRUCT<r_id STRING, s_id STRING, b_id STRING, ...>>,
ARRAY<STRUCT<r_id STRING, s_id STRING, b_id STRING, ...>>.
Supported signature: ARRAY_CONCAT(ARRAY, [ARRAY, ...]) AT [15:23]
错误消息被截断,因此那些可见字段肯定是相同的,但实际上 - 最后一个字段 - c 和 cc -(被截断)在两个数组中是不同的
希望这会有所帮助!
更新
查看以下两个 achemas 的片段:
reservations_filled.created_time TIMESTAMP NULLABLE
reservations_filled.updated_time TIMESTAMP NULLABLE
和
rsrvtn_array.created TIMESTAMP NULLABLE
rsrvtn_array.updated TIMESTAMP NULLABLE
显然与我在上述示例中预测的情况完全一样
解决方案
所以,下面会按预期失败
#standardSQL
WITH t1 AS (
SELECT 1 AS id,
ARRAY<STRUCT<a STRING, b STRING, cc STRING>>[('a1', 'b1', 'c1')] AS x
),
t2 AS (
SELECT 1 AS id,
ARRAY<STRUCT<a STRING, b STRING, c STRING>>[('a2', 'b2', 'c2')] AS y
)
SELECT x, y, ARRAY_CONCAT(x, y) AS z
FROM t1 JOIN t2 USING(id)
因为 (a,b,c) 和 (a,b,cc) 有一个元素名称不同
而且,下面会起作用
#standardSQL
WITH t1 AS (
SELECT 1 AS id,
ARRAY<STRUCT<a STRING, b STRING, cc STRING>>[('a1', 'b1', 'c1')] AS x
),
t2 AS (
SELECT 1 AS id,
ARRAY<STRUCT<a STRING, b STRING, c STRING>>[('a2', 'b2', 'c2')] AS y
)
SELECT x, y,
ARRAY_CONCAT(ARRAY(SELECT AS STRUCT a, b, cc AS c FROM UNNEST(x)), y) AS z
FROM t1 JOIN t2 USING(id)
因为 cc 被“动态地”别名为 c 从而使 schamas 不仅在布局上相似而且相同
希望现在有帮助
如果您在将上述解决方案应用于您的示例时遇到问题 - 请参见下文 :o)
SELECT
t1.rsrvtn_array a,
t2.reservations_filled b,
ARRAY_CONCAT(
ARRAY(
SELECT AS STRUCT
reservation_id,
s1_order_id,
s2_order_id,
amount, created AS created_time,
updated AS updated_time,
state,
rate,
u_amount,
u_fees
FROM UNNEST(t1.rsrvtn_array)
) , t2.reservations_filled) AS c
FROM temp.new_orders t1
JOIN temp.order_history t2
ON USING(order_id)
【讨论】:
谢谢,我尝试了一个例子并且也工作了。但是,即使签名相同,我的原始查询仍然存在问题。下面是两个数组定义: 您的意思是在评论中提供数组定义吗?请提供完整的查询和更多详细信息 - 否则无法帮助您 我刚刚也将查询添加到主帖中。非常感谢您的帮助。 查看添加到答案的解决方案 谢谢。我误解了签名是字段类型(按字段类型匹配)。我不知道它会尝试按字段名称匹配。感谢您的解释。现在可以了。以上是关于如何在bigquery中连接两个结构数组?的主要内容,如果未能解决你的问题,请参考以下文章
如何在 Google BigQuery 中的数组中的元素上创建表连接
如何将表 1 上的结构数组与 BigQuery 中表 2 的普通列连接起来
在 Bigquery 中,如何将结构的字符串化数组转换为正确的数组?