STRUCT 的 BigQuery 具体化视图
Posted
技术标签:
【中文标题】STRUCT 的 BigQuery 具体化视图【英文标题】:BigQuery Materialized View of a STRUCT 【发布时间】:2020-04-14 21:19:00 【问题描述】:我们正在尝试创建大型 BQ 表的物化视图。该表接收大量流式 Web 活动插入,是多租户的,并且真正利用了 BQ 的嵌套列结构。
我们希望创建该表的一个子集,以便以最少的管理开销更高效、近乎实时地执行查询。我们认为最简单的解决方案是创建一个物化视图,它只是行(按客户端)和列的子集,但目前物化视图需要聚合。
此外,物化视图 beta 支持一组有限的聚合函数,不支持子选择或 UNNEST 操作。我们还没有找到一种将深度嵌套的 STRUCT 提取到物化视图中的好方法。一个简单的例子:
SELECT
'7602E3E96349E972' as session_id,
'084F0262' as transaction_id,
[STRUCT(
[STRUCT(
'promotions' as name,
['SAVE50'] as value),
STRUCT(
'discounts' as name,
['9.99'] as value)
] as modifiers
)] as contexts_transaction
UNION ALL
SELECT
'7602E3E96349E972' as session_id,
'01ECB6EF' as transaction_id,
[STRUCT(
[STRUCT(
'promotions' as name,
['SPRING','LOVE'] as value),
STRUCT(
'discounts' as name,
['14.99','6.99'] as value)
] as modifiers
)] as contexts_transaction
UNION ALL
SELECT
'508082BC49BAC09F' as session_id,
'038B67CF' as transaction_id,
[STRUCT(
[STRUCT(
'promotions' as name,
['FREESHIP','HOLIDAY25'] as value),
STRUCT(
'discounts' as name,
['9.99'] as value)
] as transaction
)] as contexts_transaction
UNION ALL
SELECT
'C88AE153C784D910' as session_id,
'EA716BD2' as transaction_id,
[STRUCT(
[STRUCT(
'promotions' as name,
['CYBER'] as value),
STRUCT(
'discounts' as name,
['9.99','19.99'] as value)
] as modifiers
)]
在理想情况下,我们会按原样保留这个 STRUCT,我们正试图在物化视图中完成类似的事情(认识到这些不是支持的 MV 功能):
SELECT
session_id,
transaction_id,
ARRAY_AGG(STRUCT<name STRING, value ARRAY<STRING>>(mods_array.name,mods_array.value)) as modifiers
FROM data,
UNNEST(contexts_transaction) trans_array,
UNNEST(trans_array.modifiers) mods_array
GROUP BY 1,2
我们对任何方法对这个庞大的表进行子集化都持开放态度,不仅仅是 MV,而且希望它具有相同的好处(低维护、自动、低成本)。任何建议表示赞赏!
【问题讨论】:
你能提供一个示例输入和预期输出吗? 请编辑问题,并向我们展示您想要运行的查询。 【参考方案1】:据我从您的问题中了解到,您希望得到与此类似的输出:
with rawdata AS
(
SELECT 1 as userid, [STRUCT('transactionIds' as name, ['ABCDEF'] as value), STRUCT('couponIds' as name, ['123456'] as value)] as transactions union all
SELECT 1 as userid, [STRUCT('transactionIds' as name, ['XYZ', 'KLM'] as value), STRUCT('couponIds' as name, ['789', '567'] as value)] union all
SELECT 2 as userid, [STRUCT('transactionIds' as name, ['XY', 'KL'] as value), STRUCT('couponIds' as name, ['10', '15'] as value)] union all
SELECT 2 as userid, [STRUCT('transactionIds' as name, ['X', 'K'] as value), STRUCT('couponIds' as name, ['20', '25'] as value)]
)
select
userid,
ARRAY_CONCAT_AGG((SELECT trx.value FROM UNNEST(transactions) trx WHERE trx.name = 'transactionIds')) as transactionIds,
ARRAY_CONCAT_AGG((SELECT trx.value FROM UNNEST(transactions) trx WHERE trx.name = 'couponIds')) as couponIds
from rawdata
group by userid;
所以,输入表是这样的
同时,输出表看起来像
如果您的意图不同,请在问题中说明并提供更多详细信息。
为此,我尝试将该查询创建为物化视图。
create or replace table project.dataset.rawdata as
SELECT 1 as userid, [STRUCT('transactionIds' as name, ['ABCDEF'] as value), STRUCT('couponIds' as name, ['123456'] as value)] as transactions union all
SELECT 1 as userid, [STRUCT('transactionIds' as name, ['XYZ', 'KLM'] as value), STRUCT('couponIds' as name, ['789', '567'] as value)] union all
SELECT 2 as userid, [STRUCT('transactionIds' as name, ['XY', 'KL'] as value), STRUCT('couponIds' as name, ['10', '15'] as value)] union all
SELECT 2 as userid, [STRUCT('transactionIds' as name, ['X', 'K'] as value), STRUCT('couponIds' as name, ['20', '25'] as value)]
;
create materialized view project.dataset.mview as
select
userid,
ARRAY_CONCAT_AGG((SELECT trx.value FROM UNNEST(transactions) trx WHERE trx.name = 'transactionIds')) as transactionIds,
ARRAY_CONCAT_AGG((SELECT trx.value FROM UNNEST(transactions) trx WHERE trx.name = 'couponIds')) as couponIds
from project.dataset.rawdata
GROUP BY userid
但是,我收到错误 Unsupported aggregation function in materialized view: array_concat_agg.
。
由于物化视图还处于测试阶段,我们不知道将来是否会支持它。但是,目前的能力无法做到这一点。
@fhoffa 可能会提供更多信息。
【讨论】:
谢谢萨布里和费利佩。您正在寻找的优秀反馈和细节示例。我们现在正在编辑问题。 我已经编辑了这个问题 - 希望我的意图更清楚!以上是关于STRUCT 的 BigQuery 具体化视图的主要内容,如果未能解决你的问题,请参考以下文章