在 BigQuery 中取消嵌套 JSON 字符串化数组
Posted
技术标签:
【中文标题】在 BigQuery 中取消嵌套 JSON 字符串化数组【英文标题】:Unnest a JSON stringified array in BigQuery 【发布时间】:2020-07-26 12:36:32 【问题描述】:我在 Google BigQuery 中有下表:
+------------+---------+---------+
| Name | City | items |
+------------+---------+
| James | Dallas |['text': 'pear', 'line_total_excl_vat': '24','product_id': 100]
| John | Chicago |['text': 'apple', 'line_total_excl_vat': '29','product_id': 200,'text': 'banana', 'line_total_excl_vat': '34','product_id': 300]
+------------+---------+
我正在努力实现这样的目标:
+------------+---------+---------+----------------------+--------------+
| Name | City | text | line_total_excl_vat | product_id
+------------+---------+
| James | Dallas | pear | 24 | 100
| John | Chicago | apple | 29 | 200
| John | Chicago | banana | 34 | 300
+------------+---------+
“items”列实际上是一个字符串。有没有办法取消嵌套这种数据格式并在 BigQuery 中实现我想要的视图?谢谢!
【问题讨论】:
你知道列名吗?如果没有,你不能用简单的select
来做到这一点。
是的,我知道列的名称
【参考方案1】:
以下是 BigQuery 标准 SQL
#standardSQL
SELECT Name, City,
JSON_EXTRACT_SCALAR(json, '$.text') AS text,
JSON_EXTRACT_SCALAR(json, '$.line_total_excl_vat') AS line_total_excl_vat,
JSON_EXTRACT_SCALAR(json, '$.product_id') AS product_id
FROM `project.dataset.table`,
UNNEST(JSON_EXTRACT_ARRAY(items,'$')) json
如果适用于您问题中的示例数据 - 如下例所示
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'James' AS Name, 'Dallas' AS City, "['text': 'pear', 'line_total_excl_vat': '24','product_id': 100]" AS items UNION ALL
SELECT 'John', 'Chicago', "['text': 'apple', 'line_total_excl_vat': '29','product_id': 200,'text': 'banana', 'line_total_excl_vat': '34','product_id': 300]"
)
SELECT Name, City,
JSON_EXTRACT_SCALAR(json, '$.text') AS text,
JSON_EXTRACT_SCALAR(json, '$.line_total_excl_vat') AS line_total_excl_vat,
JSON_EXTRACT_SCALAR(json, '$.product_id') AS product_id
FROM `project.dataset.table`,
UNNEST(JSON_EXTRACT_ARRAY(items,'$')) json
输出是
Row Name City text line_total_excl_vat product_id
1 James Dallas pear 24 100
2 John Chicago apple 29 200
3 John Chicago banana 34 300
【讨论】:
【参考方案2】:json_extract 和 json_extract_array 结合 unnest() 有点摆弄...
WITH t AS (
SELECT 'James' as Name, 'Dallas' AS City, "['text': 'pear', 'line_total_excl_vat': '24','product_id': 100]" AS items
UNION ALL
SELECT 'John', 'Chicago', "['text': 'apple', 'line_total_excl_vat': '29','product_id': 200,'text': 'banana', 'line_total_excl_vat': '34','product_id': 300]"
)
SELECT
# we'll unnest this array in the next statement and grab its elements
JSON_EXTRACT_ARRAY(items,'$') as arr
# unnest() turns array into table format - jason-function extracts fields from each row
,ARRAY(SELECT AS STRUCT
JSON_EXTRACT_SCALAR(i,'$.text') as text,
JSON_EXTRACT_SCALAR(i,'$.line_total_excl_vat') as line_total_excl_vat,
JSON_EXTRACT_SCALAR(i,'$.product_id') as product_id
FROM UNNEST(JSON_EXTRACT_ARRAY(items,'$')) as i
) AS unnested_items
,* # original fields for reference
FROM t
这将创建一个嵌套输出,您可以稍后使用(请参阅输出的 JSON 表示,它更清楚) - 如果您想展平表格,您可以横向连接这个结果数组...
WITH t AS (
# Name | City | items |
SELECT 'James' as Name, 'Dallas' AS City, "['text': 'pear', 'line_total_excl_vat': '24','product_id': 100]" AS items
UNION ALL
SELECT 'John', 'Chicago', "['text': 'apple', 'line_total_excl_vat': '29','product_id': 200,'text': 'banana', 'line_total_excl_vat': '34','product_id': 300]"
)
SELECT
*
FROM t CROSS JOIN UNNEST(ARRAY((SELECT AS STRUCT
JSON_EXTRACT_SCALAR(i,'$.text') as text,
JSON_EXTRACT_SCALAR(i,'$.line_total_excl_vat') as line_total_excl_vat,
JSON_EXTRACT_SCALAR(i,'$.product_id') as product_id
FROM UNNEST(JSON_EXTRACT_ARRAY(items,'$')) as i
)))
【讨论】:
以上是关于在 BigQuery 中取消嵌套 JSON 字符串化数组的主要内容,如果未能解决你的问题,请参考以下文章