如何在 BigQuery 上展开数组以将列添加到现有表
Posted
技术标签:
【中文标题】如何在 BigQuery 上展开数组以将列添加到现有表【英文标题】:How do you expand an array on BigQuery to add columns to the existing table 【发布时间】:2021-05-12 09:44:23 【问题描述】:目前我在扩展 BigQuery 表中提供的数组时遇到问题。
是否可以创建另一个表或视图,将msgs
列中的所有项目展开到另一个表中。我提供了下面的截图
我该怎么做?
这是一个从我的表中获取所有表和列的查询
SELECT * FROM `aftership.shipments`
这里是数据类型的列表
字段名称、类型、模式
_id,STRING,NULLABLE
_index,INTEGER,NULLABLE
_created,TIMESTAMP,NULLABLE
_fivetran_synced,TIMESTAMP,NULLABLE
_ip,STRING,NULLABLE
event,STRING,NULLABLE
event_id,STRING,NULLABLE
is_tracking_first_tag,BOOLEAN,NULLABLE
msg,STRING,NULLABLE
ts,INTEGER,NULLABLE
msg 列中的示例数据将是
"id": "gynv1fsa8m7amkoct4zvx00w",
"tracking_number": "KEX99999999",
"title": "#xx589",
"note": null,
"origin_country_iso3": "THA",
"destination_country_iso3": "THA",
"courier_destination_country_iso3": "THA",
"shipment_package_count": null,
"active": false,
"order_id": "3753237577781",
"order_id_path": null,
"order_date": "2021-05-06T01:09:01Z",
"customer_name": "คุณx xxx",
"source": "shopify",
"emails": ["jonappleseed@gmail.com"],
"smses": ["+669999999"],
"subscribed_smses": [],
"subscribed_emails": [],
"android": [],
"ios": [],
"return_to_sender": false,
"custom_fields":
"item_names": "เซต x & x x x x 1"
,
"tag": "Delivered",
"subtag": "Delivered_001",
"subtag_message": "Delivered",
"tracked_count": 26,
"expected_delivery": null,
"signed_by": "คุณxxx xx #32589",
"shipment_type": null,
"created_at": "2021-05-06T11:29:54+00:00",
"updated_at": "2021-05-07T09:00:32+00:00",
"slug": "kerry-logistics",
"unique_token": "deprecated",
"path": "deprecated",
"shipment_weight": null,
"shipment_weight_unit": null,
"delivery_time": 2,
"last_m…
Aftership. Shipments
【问题讨论】:
你的问题不清楚。您要添加哪一列?逻辑是什么?是否需要所有这些列来回答问题?或许您可以提出一个清晰解释的问题,连同示例数据和所需结果(可能是简化的)来说明逻辑。 @GordonLinoff 嗨,我的问题是我想将上面提供的字符串转换为“msg”列下的一行示例,并将其转换为另一个表中的列。 【参考方案1】:您可以做的是将您的 JSON_string 转换为包含您感兴趣的字段的 STRUCT
(STRUCT
不是必需的,但可以使事情井井有条):
WITH sample AS (
SELECT "1" AS id, "\"id\":\"gynv1fsa8m7amkoct4zvx00w\",\"tracking_number\":\"KEX99999999\",\"title\":\"#xx589\",\"note\":null,\"origin_country_iso3\":\"THA\",\"destination_country_iso3\":\"THA\",\"courier_destination_country_iso3\":\"THA\",\"shipment_package_count\":null,\"active\":false,\"order_id\":\"3753237577781\",\"order_id_path\":null,\"order_date\":\"2021-05-06T01:09:01Z\",\"customer_name\":\"\u0e04\u0e38\u0e13x xxx\",\"source\":\"shopify\",\"emails\":[\"jonappleseed@gmail.com\", \"jonappleseed@yahoo.com\",\"jonappleseed@outlook.com\"],\"smses\":[\"+669999999\"],\"subscribed_smses\":[],\"subscribed_emails\":[],\"android\":[],\"ios\":[],\"return_to_sender\":false,\"custom_fields\":\"item_names\":\"\u0e40\u0e0b\u0e15 x & x x x x 1\",\"tag\":\"Delivered\",\"subtag\":\"Delivered_001\",\"subtag_message\":\"Delivered\",\"tracked_count\":26,\"expected_delivery\":null,\"signed_by\":\"\u0e04\u0e38\u0e13xxx xx #32589\",\"shipment_type\":null,\"created_at\":\"2021-05-06T11:29:54+00:00\",\"updated_at\":\"2021-05-07T09:00:32+00:00\",\"slug\":\"kerry-logistics\",\"unique_token\":\"deprecated\",\"path\":\"deprecated\",\"shipment_weight\":null,\"shipment_weight_unit\":null,\"delivery_time\":2" AS msg
)
SELECT id,
STRUCT(JSON_VALUE(msg, '$.id') AS mid,
JSON_VALUE(msg, '$.title') AS title,
JSON_VALUE(msg, '$.tag') AS tag,
JSON_VALUE(msg, '$.active') AS is_active,
JSON_EXTRACT_ARRAY(JSON_QUERY(msg, '$.emails')) AS emails,
JSON_QUERY(msg, '$.custom_fields') AS custom_fields) AS msg
FROM sample
这给了
(附加邮件不在原始邮件中,我添加它们是为了表明您可以将 JSON 字符串中的数组提取到 THE STRUCT 中的 ARRAY 中)。
您只需按照模板为您感兴趣的每个字段添加一行:
JSON_VALUE(msg, '$.<FIELD>') AS field_column
注意:JSON_VALUE 或 JSON_QUERY 取决于您想要什么
【讨论】:
以上是关于如何在 BigQuery 上展开数组以将列添加到现有表的主要内容,如果未能解决你的问题,请参考以下文章
data.table 连接然后将列添加到现有的 data.frame 而无需重新复制