BigQuery:如何从重复记录中仅提取某些字段作为另一个重复字段

Posted

技术标签:

【中文标题】BigQuery:如何从重复记录中仅提取某些字段作为另一个重复字段【英文标题】:BigQuery: How to extract only certain field from REPEATED records as another REPEATED field 【发布时间】:2018-11-04 22:36:24 【问题描述】:

这是 BigQuery 中的示例表:

WITH test AS (
  SELECT
    [ 
      STRUCT("Rudisha" as name, 123 as id),
      STRUCT("Murphy" as name, 124 as id),
      STRUCT("Bosse" as name, 125 as id),
      STRUCT("Rotich" as name,  126 as id)
    ] AS data

    UNION

    [
      STRUCT("Lewandowski" as name, 127 as id),
      STRUCT("Kipketer" as name, 128 as id),
      STRUCT("Berian" as name, 129 as id)
    ] AS data
)

这里我想将记录字段(“数据”)中的“id”字段提取为可重复字段。所以行数将保持不变,但只有重复类型的 ids 字段:

ids: [123, 124, 125, 126]
ids: [127, 128, 129]

我该怎么做?

【问题讨论】:

【参考方案1】:

以下是 BigQuery 标准 SQL

#standardSQL
WITH test AS (
  SELECT
    [ 
      STRUCT("Rudisha" AS name, 123 AS id),
      STRUCT("Murphy" AS name, 124 AS id),
      STRUCT("Bosse" AS name, 125 AS id),
      STRUCT("Rotich" AS name,  126 AS id)
    ] AS data
    UNION ALL SELECT
    [
      STRUCT("Lewandowski" AS name, 127 AS id),
      STRUCT("Kipketer" AS name, 128 AS id),
      STRUCT("Berian" AS name, 129 AS id)
    ] AS data
)
SELECT ARRAY(SELECT id FROM UNNEST(data)) ids
FROM test

【讨论】:

以上是关于BigQuery:如何从重复记录中仅提取某些字段作为另一个重复字段的主要内容,如果未能解决你的问题,请参考以下文章

BigQuery 从查询中创建重复记录字段

Python:如何从数据框中仅提取年月日[重复]

如何从 EF 中的表中仅选择某些字段

重复字段的 BigQuery 记录

将 BigQuery 查询结果行写入 csv 文件时,某些记录重复

如何使用 bigquery 从日期列中仅查找星期日