如何将嵌套的 json 导入谷歌大查询

Posted 2023-03-25

技术标签:

【中文标题】如何将嵌套的 json 导入谷歌大查询【英文标题】：How to import nested json into google big query 【发布时间】：2019-09-30 18:28:13 【问题描述】：

我正在将 JSON 插入 Google Big Query。问题的底部是 JSON 的架构。

以下是 JSON 示例：


    "_index":"data",
    "_type":"collection_v1",
    "_id":"548d035f23r8987b768a5e60",
    "_score":1,
    "_source":
        "fullName":"Mike Smith",
        "networks":[
            
                "id":[
                    "12923449"
                ],
                "network":"facebook",
                "link":"https://www.facebook.com/127654449"
            
        ],
        "sex":
            "network":"facebook",
            "value":"male"
        ,
        "interests":

        ,
        "score":1.045,
        "merged_by":"548f899444v5t4v45te9a4cc"

如您所见，有一个带有“Mike Smith”的“_source.fullName”字段。当我尝试用它创建一个表时，它会出错：

为非重复字段指定的数组：_source.fullName。

我相信这个字段是 _source 的一次性字段。我该如何克服这个错误？

这是架构：

[
    
        "name": "_index",
        "type": "STRING",
        "mode": "NULLABLE"
    ,
    
        "name": "_id",
        "type": "STRING",
        "mode": "NULLABLE"
    ,
    
        "name": "_type",
        "type": "STRING",
        "mode": "NULLABLE"
    ,
    
        "name": "score",
        "type": "STRING",
        "mode": "NULLABLE"
    ,
    
        "name": "header",
        "type": "STRING",
        "mode": "NULLABLE"
    ,
    
        "name": "fullName",
        "type": "STRING",
        "mode": "NULLABLE"
    ,
    
        "name": "src",
        "type": "STRING",
        "mode": "NULLABLE"
    ,
    
        "name": "avatar",
        "type": "STRING",
        "mode": "NULLABLE"
    ,
    
        "name": "merged_by",
        "type": "STRING",
        "mode": "NULLABLE"
    ,
    
        "name": "cover",
        "type": "STRING",
        "mode": "NULLABLE"
    ,
    
        "name": "sex",
        "type": "RECORD",
        "mode": "NULLABLE",
        "fields": [
            
                "name": "network",
                "type": "STRING",
                "mode": "NULLABLE"
            ,
            
                "name": "value",
                "type": "STRING",
                "mode": "NULLABLE"
            
        ]
    ,
    
        "name": "_source",
        "type": "RECORD",
        "mode": "NULLABLE",
        "fields": [
            
                "name": "fullName",
                "type": "STRING",
                "mode": "NULLABLE"
            ,
            
                "name": "links",
                "type": "STRING",
                "mode": "REPEATED"
            ,
            
                "name": "birthday",
                "type": "RECORD",
                "mode": "REPEATED",
                "fields": [
                    
                        "name": "value",
                        "type": "STRING",
                        "mode": "NULLABLE"
                    ,
                    
                        "name": "network",
                        "type": "STRING",
                        "mode": "NULLABLE"
                    
                ]
            ,
            
                "name": "phones",
                "type": "STRING",
                "mode": "REPEATED"
            ,
            
                "name": "pictures",
                "type": "RECORD",
                "mode": "REPEATED",
                "fields": [
                    
                        "name": "url",
                        "type": "STRING",
                        "mode": "NULLABLE"
                    ,
                    
                        "name": "tab",
                        "type": "STRING",
                        "mode": "NULLABLE"
                    ,
                    
                        "name": "network",
                        "type": "STRING",
                        "mode": "NULLABLE"
                    
                ]
            ,
            
                "name": "contacts",
                "type": "RECORD",
                "mode": "REPEATED",
                "fields": [
                    
                        "name": "id",
                        "type": "STRING",
                        "mode": "NULLABLE"
                    ,
                    
                        "name": "fullName",
                        "type": "STRING",
                        "mode": "NULLABLE"
                    ,
                    
                        "name": "tag",
                        "type": "STRING",
                        "mode": "NULLABLE"
                    ,
                    
                        "name": "network",
                        "type": "STRING",
                        "mode": "NULLABLE"
                    
                ]
            ,
            
                "name": "groups",
                "type": "RECORD",
                "mode": "REPEATED",
                "fields": [
                    
                        "name": "id",
                        "type": "STRING",
                        "mode": "NULLABLE"
                    ,
                    
                        "name": "Name",
                        "type": "STRING",
                        "mode": "NULLABLE"
                    ,
                    
                        "name": "network",
                        "type": "STRING",
                        "mode": "NULLABLE"
                    
                ]
            ,
            
                "name": "skills",
                "type": "RECORD",
                "mode": "REPEATED",
                "fields": [
                    
                        "name": "value",
                        "type": "STRING",
                        "mode": "NULLABLE"
                    ,
                    
                        "name": "network",
                        "type": "STRING",
                        "mode": "NULLABLE"
                    
                ]
            ,
            
                "name": "relations",
                "type": "RECORD",
                "mode": "REPEATED",
                "fields": [
                    
                        "name": "value",
                        "type": "STRING",
                        "mode": "NULLABLE"
                    ,
                    
                        "name": "network",
                        "type": "STRING",
                        "mode": "NULLABLE"
                    
                ]
            ,
            
                "name": "about",
                "type": "RECORD",
                "mode": "REPEATED",
                "fields": [
                    
                        "name": "value",
                        "type": "STRING",
                        "mode": "NULLABLE"
                    ,
                    
                        "name": "network",
                        "type": "STRING",
                        "mode": "NULLABLE"
                    
                ]
            ,
            
                "name": "emails",
                "type": "STRING",
                "mode": "REPEATED"
            ,
            
                "name": "languages",
                "type": "STRING",
                "mode": "REPEATED"
            ,
            
                "name": "places",
                "type": "RECORD",
                "mode": "REPEATED",
                "fields": [
                    
                        "name": "network",
                        "type": "STRING",
                        "mode": "NULLABLE"
                    ,
                    
                        "name": "value",
                        "type": "STRING",
                        "mode": "NULLABLE"
                    ,
                    
                        "name": "type",
                        "type": "STRING",
                        "mode": "NULLABLE"
                    
                ]
            ,
            
                "name": "education",
                "type": "RECORD",
                "mode": "REPEATED",
                "fields": [
                    
                        "name": "network",
                        "type": "STRING",
                        "mode": "NULLABLE"
                    ,
                    
                        "name": "school",
                        "type": "STRING",
                        "mode": "NULLABLE"
                    
                ]
            ,
            
                "name": "experience",
                "type": "RECORD",
                "mode": "REPEATED",
                "fields": [
                    
                        "name": "network",
                        "type": "STRING",
                        "mode": "NULLABLE"
                    ,
                    
                        "name": "start",
                        "type": "NUMERIC",
                        "mode": "NULLABLE"
                    ,
                    
                        "name": "company",
                        "type": "STRING",
                        "mode": "NULLABLE"
                    ,
                    
                        "name": "title",
                        "type": "STRING",
                        "mode": "NULLABLE"
                    
                ]
            ,
            
                "name": "networks",
                "type": "RECORD",
                "mode": "REPEATED",
                "fields": [
                    
                        "name": "network",
                        "type": "STRING",
                        "mode": "NULLABLE"
                    ,
                    
                        "name": "link",
                        "type": "STRING",
                        "mode": "NULLABLE"
                    ,
                    
                        "name": "id",
                        "type": "STRING",
                        "mode": "REPEATED"
                    
                ]
            ,
            
                "name": "network",
                "type": "RECORD",
                "mode": "REPEATED",
                "fields": [
                    
                        "name": "others",
                        "type": "RECORD",
                        "mode": "REPEATED",
                        "fields": [
                            
                                "name": "network",
                                "type": "STRING",
                                "mode": "NULLABLE"
                            ,
                            
                                "name": "value",
                                "type": "STRING",
                                "mode": "NULLABLE"
                            ,
                            
                                "name": "tag",
                                "type": "STRING",
                                "mode": "NULLABLE"
                            
                        ]
                    ,
                    
                        "name": "books",
                        "type": "RECORD",
                        "mode": "REPEATED",
                        "fields": [
                            
                                "name": "network",
                                "type": "STRING",
                                "mode": "NULLABLE"
                            ,
                            
                                "name": "value",
                                "type": "STRING",
                                "mode": "NULLABLE"
                            ,
                            
                                "name": "tag",
                                "type": "STRING",
                                "mode": "NULLABLE"
                            
                        ]
                    ,
                    
                        "name": "music",
                        "type": "RECORD",
                        "mode": "REPEATED",
                        "fields": [
                            
                                "name": "network",
                                "type": "STRING",
                                "mode": "NULLABLE"
                            ,
                            
                                "name": "value",
                                "type": "STRING",
                                "mode": "NULLABLE"
                            ,
                            
                                "name": "tag",
                                "type": "STRING",
                                "mode": "NULLABLE"
                            
                        ]
                    ,
                    
                        "name": "games",
                        "type": "RECORD",
                        "mode": "REPEATED",
                        "fields": [
                            
                                "name": "network",
                                "type": "STRING",
                                "mode": "NULLABLE"
                            ,
                            
                                "name": "value",
                                "type": "STRING",
                                "mode": "NULLABLE"
                            ,
                            
                                "name": "tag",
                                "type": "STRING",
                                "mode": "NULLABLE"
                            
                        ]
                    ,
                    
                        "name": "spotify",
                        "type": "RECORD",
                        "mode": "REPEATED",
                        "fields": [
                            
                                "name": "network",
                                "type": "STRING",
                                "mode": "NULLABLE"
                            ,
                            
                                "name": "value",
                                "type": "STRING",
                                "mode": "NULLABLE"
                            ,
                            
                                "name": "tag",
                                "type": "STRING",
                                "mode": "NULLABLE"
                            
                        ]
                    
                ]
            
        ]
    
]

【问题讨论】：

首先想知道：这是否已经是换行符分隔的 JSON，并且您将其张贴在此处以方便阅读？是的，先生。每行一个 json。我只是为此打印了它。 【参考方案1】：

您可以像导入 CSV 一样导入完整的 json 行 - 基本上是一个包含 json 对象的单列 BigQuery 表。然后你可以在 BigQuery 中随意解析 JSON，查询如下：

WITH j AS (

SELECT """"_index":"data","_type":"collection_v1","_id":"548d035f23r8987b768a5e60","_score":1,"_source":"fullName":"Mike Smith","networks":["id":["12923449"],"network":"facebook","link":"https://www.facebook.com/127654449"],"sex":"network":"facebook","value":"male","interests":,"score":1.045,"merged_by":"548f899444v5t4v45te9a4cc"""" j
)


SELECT index
  , STRUCT(
   JSON_EXTRACT_SCALAR(source, '$.fullName') AS fullName
   , [
       STRUCT(
       JSON_EXTRACT_SCALAR(source, '$.networks[0].id[0]') AS id
       , JSON_EXTRACT_SCALAR(source, '$.networks[0].network') AS network
       , JSON_EXTRACT_SCALAR(source, '$.networks[0].link') AS link)
     ] AS networks
   ) source
FROM (
  SELECT JSON_EXTRACT_SCALAR(j.j, '$._index') index
    , JSON_EXTRACT(j.j, '$._source') source
  FROM j
)

见：

https://medium.com/google-cloud/bigquery-lazy-data-loading-ddl-dml-partitions-and-half-a-trillion-wikipedia-pageviews-cd3eacd657b6

【讨论】：

好主意，但文件太大，无法导入为 CSV。它出错了。 Array specified for non-repeated field: _source.fullName. 以 JSON 格式加载时会出现该错误。当您加载为 CSV 时，它会给您什么错误？文件太大。现在想剪掉它如何使用 CSV 而不是 JSON 获得该错误？两者都有相同的限制cloud.google.com/bigquery/quotas#load_jobs

以上是关于如何将嵌套的 json 导入谷歌大查询的主要内容，如果未能解决你的问题，请参考以下文章