如何在 BigQuery 上展开数组以将列添加到现有表

Posted 2023-03-24

技术标签:

【中文标题】如何在 BigQuery 上展开数组以将列添加到现有表【英文标题】：How do you expand an array on BigQuery to add columns to the existing table 【发布时间】：2021-05-12 09:44:23 【问题描述】：

目前我在扩展 BigQuery 表中提供的数组时遇到问题。

是否可以创建另一个表或视图，将msgs 列中的所有项目展开到另一个表中。我提供了下面的截图

我该怎么做？

这是一个从我的表中获取所有表和列的查询

SELECT * FROM `aftership.shipments`

这里是数据类型的列表

字段名称、类型、模式

_id,STRING,NULLABLE 
_index,INTEGER,NULLABLE 
_created,TIMESTAMP,NULLABLE 
_fivetran_synced,TIMESTAMP,NULLABLE 
_ip,STRING,NULLABLE 
event,STRING,NULLABLE   
event_id,STRING,NULLABLE    
is_tracking_first_tag,BOOLEAN,NULLABLE  
msg,STRING,NULLABLE 
ts,INTEGER,NULLABLE

msg 列中的示例数据将是


    "id": "gynv1fsa8m7amkoct4zvx00w",
    "tracking_number": "KEX99999999",
    "title": "#xx589",
    "note": null,
    "origin_country_iso3": "THA",
    "destination_country_iso3": "THA",
    "courier_destination_country_iso3": "THA",
    "shipment_package_count": null,
    "active": false,
    "order_id": "3753237577781",
    "order_id_path": null,
    "order_date": "2021-05-06T01:09:01Z",
    "customer_name": "คุณx xxx",
    "source": "shopify",
    "emails": ["jonappleseed@gmail.com"],
    "smses": ["+669999999"],
    "subscribed_smses": [],
    "subscribed_emails": [],
    "android": [],
    "ios": [],
    "return_to_sender": false,
    "custom_fields": 
        "item_names": "เซต x & x x x x 1"
    ,
    "tag": "Delivered",
    "subtag": "Delivered_001",
    "subtag_message": "Delivered",
    "tracked_count": 26,
    "expected_delivery": null,
    "signed_by": "คุณxxx xx #32589",
    "shipment_type": null,
    "created_at": "2021-05-06T11:29:54+00:00",
    "updated_at": "2021-05-07T09:00:32+00:00",
    "slug": "kerry-logistics",
    "unique_token": "deprecated",
    "path": "deprecated",
    "shipment_weight": null,
    "shipment_weight_unit": null,
    "delivery_time": 2,
    "last_m…

Aftership. Shipments

【问题讨论】：

你的问题不清楚。您要添加哪一列？逻辑是什么？是否需要所有这些列来回答问题？或许您可以提出一个清晰解释的问题，连同示例数据和所需结果（可能是简化的）来说明逻辑。 @GordonLinoff 嗨，我的问题是我想将上面提供的字符串转换为“msg”列下的一行示例，并将其转换为另一个表中的列。 【参考方案1】：

您可以做的是将您的 JSON_string 转换为包含您感兴趣的字段的 STRUCT（STRUCT 不是必需的，但可以使事情井井有条）：

WITH sample AS (
    SELECT "1" AS id, "\"id\":\"gynv1fsa8m7amkoct4zvx00w\",\"tracking_number\":\"KEX99999999\",\"title\":\"#xx589\",\"note\":null,\"origin_country_iso3\":\"THA\",\"destination_country_iso3\":\"THA\",\"courier_destination_country_iso3\":\"THA\",\"shipment_package_count\":null,\"active\":false,\"order_id\":\"3753237577781\",\"order_id_path\":null,\"order_date\":\"2021-05-06T01:09:01Z\",\"customer_name\":\"\u0e04\u0e38\u0e13x xxx\",\"source\":\"shopify\",\"emails\":[\"jonappleseed@gmail.com\", \"jonappleseed@yahoo.com\",\"jonappleseed@outlook.com\"],\"smses\":[\"+669999999\"],\"subscribed_smses\":[],\"subscribed_emails\":[],\"android\":[],\"ios\":[],\"return_to_sender\":false,\"custom_fields\":\"item_names\":\"\u0e40\u0e0b\u0e15 x & x x x x 1\",\"tag\":\"Delivered\",\"subtag\":\"Delivered_001\",\"subtag_message\":\"Delivered\",\"tracked_count\":26,\"expected_delivery\":null,\"signed_by\":\"\u0e04\u0e38\u0e13xxx xx #32589\",\"shipment_type\":null,\"created_at\":\"2021-05-06T11:29:54+00:00\",\"updated_at\":\"2021-05-07T09:00:32+00:00\",\"slug\":\"kerry-logistics\",\"unique_token\":\"deprecated\",\"path\":\"deprecated\",\"shipment_weight\":null,\"shipment_weight_unit\":null,\"delivery_time\":2" AS msg
)

SELECT id,
    STRUCT(JSON_VALUE(msg, '$.id') AS mid,
    JSON_VALUE(msg, '$.title') AS title,
    JSON_VALUE(msg, '$.tag') AS tag,
    JSON_VALUE(msg, '$.active') AS is_active,
    JSON_EXTRACT_ARRAY(JSON_QUERY(msg, '$.emails')) AS emails,
    JSON_QUERY(msg, '$.custom_fields') AS custom_fields) AS msg
FROM sample

这给了

（附加邮件不在原始邮件中，我添加它们是为了表明您可以将 JSON 字符串中的数组提取到 THE STRUCT 中的 ARRAY 中）。

您只需按照模板为您感兴趣的每个字段添加一行：

JSON_VALUE(msg, '$.<FIELD>') AS field_column

注意：JSON_VALUE 或 JSON_QUERY 取决于您想要什么

【讨论】：

以上是关于如何在 BigQuery 上展开数组以将列添加到现有表的主要内容，如果未能解决你的问题，请参考以下文章