STRUCT 的 BigQuery 具体化视图

Posted

技术标签:

【中文标题】STRUCT 的 BigQuery 具体化视图【英文标题】:BigQuery Materialized View of a STRUCT 【发布时间】:2020-04-14 21:19:00 【问题描述】:

我们正在尝试创建大型 BQ 表的物化视图。该表接收大量流式 Web 活动插入,是多租户的,并且真正利用了 BQ 的嵌套列结构。

我们希望创建该表的一个子集,以便以最少的管理开销更高效、近乎实时地执行查询。我们认为最简单的解决方案是创建一个物化视图,它只是行(按客户端)和列的子集,但目前物化视图需要聚合。

此外,物化视图 beta 支持一组有限的聚合函数,不支持子选择或 UNNEST 操作。我们还没有找到一种将深度嵌套的 STRUCT 提取到物化视图中的好方法。一个简单的例子:

SELECT 
  '7602E3E96349E972' as session_id,
  '084F0262' as transaction_id,
  [STRUCT(
    [STRUCT(
      'promotions' as name,
      ['SAVE50'] as value), 
      STRUCT(
        'discounts' as name,
        ['9.99'] as value)
    ] as modifiers
  )] as contexts_transaction
UNION ALL
SELECT 
  '7602E3E96349E972' as session_id,
  '01ECB6EF' as transaction_id,
  [STRUCT(
    [STRUCT(
      'promotions' as name,
      ['SPRING','LOVE'] as value), 
      STRUCT(
        'discounts' as name,
        ['14.99','6.99'] as value)
    ] as modifiers
  )] as contexts_transaction
UNION ALL
SELECT 
  '508082BC49BAC09F' as session_id,
  '038B67CF' as transaction_id,
  [STRUCT(
    [STRUCT(
      'promotions' as name,
      ['FREESHIP','HOLIDAY25'] as value), 
      STRUCT(
        'discounts' as name,
        ['9.99'] as value)
    ] as transaction
  )] as contexts_transaction
UNION ALL
SELECT 
  'C88AE153C784D910' as session_id,
  'EA716BD2' as transaction_id,
  [STRUCT(
    [STRUCT(
      'promotions' as name,
      ['CYBER'] as value), 
      STRUCT(
        'discounts' as name,
        ['9.99','19.99'] as value)
    ] as modifiers
  )]

在理想情况下,我们会按原样保留这个 STRUCT,我们正试图在物化视图中完成类似的事情(认识到这些不是支持的 MV 功能):

SELECT
session_id,
transaction_id,
ARRAY_AGG(STRUCT<name STRING, value ARRAY<STRING>>(mods_array.name,mods_array.value)) as modifiers
FROM data,
UNNEST(contexts_transaction) trans_array,
UNNEST(trans_array.modifiers) mods_array
GROUP BY 1,2

我们对任何方法对这个庞大的表进行子集化都持开放态度,不仅仅是 MV,而且希望它具有相同的好处(低维护、自动、低成本)。任何建议表示赞赏!

【问题讨论】:

你能提供一个示例输入和预期输出吗? 请编辑问题,并向我们展示您想要运行的查询。 【参考方案1】:

据我从您的问题中了解到,您希望得到与此类似的输出:

with rawdata AS
(
  SELECT 1 as userid, [STRUCT('transactionIds' as name, ['ABCDEF'] as value), STRUCT('couponIds' as name, ['123456'] as value)] as transactions union all
  SELECT 1 as userid, [STRUCT('transactionIds' as name, ['XYZ', 'KLM'] as value), STRUCT('couponIds' as name, ['789', '567'] as value)] union all
  SELECT 2 as userid, [STRUCT('transactionIds' as name, ['XY', 'KL'] as value), STRUCT('couponIds' as name, ['10', '15'] as value)] union all
  SELECT 2 as userid, [STRUCT('transactionIds' as name, ['X', 'K'] as value), STRUCT('couponIds' as name, ['20', '25'] as value)]
)
select 
  userid,
  ARRAY_CONCAT_AGG((SELECT trx.value FROM UNNEST(transactions) trx WHERE trx.name = 'transactionIds')) as transactionIds,
  ARRAY_CONCAT_AGG((SELECT trx.value FROM UNNEST(transactions) trx WHERE trx.name = 'couponIds')) as couponIds
from rawdata
group by userid;

所以,输入表是这样的

同时,输出表看起来像

如果您的意图不同,请在问题中说明并提供更多详细信息。

为此,我尝试将该查询创建为物化视图。

create or replace table project.dataset.rawdata as 
  SELECT 1 as userid, [STRUCT('transactionIds' as name, ['ABCDEF'] as value), STRUCT('couponIds' as name, ['123456'] as value)] as transactions union all
  SELECT 1 as userid, [STRUCT('transactionIds' as name, ['XYZ', 'KLM'] as value), STRUCT('couponIds' as name, ['789', '567'] as value)] union all
  SELECT 2 as userid, [STRUCT('transactionIds' as name, ['XY', 'KL'] as value), STRUCT('couponIds' as name, ['10', '15'] as value)] union all
  SELECT 2 as userid, [STRUCT('transactionIds' as name, ['X', 'K'] as value), STRUCT('couponIds' as name, ['20', '25'] as value)]
;

create materialized view project.dataset.mview as 
select 
  userid,
  ARRAY_CONCAT_AGG((SELECT trx.value FROM UNNEST(transactions) trx WHERE trx.name = 'transactionIds')) as transactionIds,
  ARRAY_CONCAT_AGG((SELECT trx.value FROM UNNEST(transactions) trx WHERE trx.name = 'couponIds')) as couponIds
from project.dataset.rawdata
GROUP BY userid

但是,我收到错误 Unsupported aggregation function in materialized view: array_concat_agg.。 由于物化视图还处于测试阶段,我们不知道将来是否会支持它。但是,目前的能力无法做到这一点。

@fhoffa 可能会提供更多信息。

【讨论】:

谢谢萨布里和费利佩。您正在寻找的优秀反馈和细节示例。我们现在正在编辑问题。 我已经编辑了这个问题 - 希望我的意图更清楚!

以上是关于STRUCT 的 BigQuery 具体化视图的主要内容,如果未能解决你的问题,请参考以下文章

BigQuery 具体化视图 - ARRAY_AGG 问题

BigQuery 在创建实体化视图时出现“缺少空格”错误

如何使用 UI 从 BigQuery 中的视图创建表?

BigQuery:获取 STRUCT 的字段名称

有没有办法在 BigQuery 中保存的视图中使用脚本方法?

BigQuery - 将通用 JSON 转换为 STRUCT