无法使用 Google 存储桶中的 Biquery 解析 JSON

Posted

技术标签:

【中文标题】无法使用 Google 存储桶中的 Biquery 解析 JSON【英文标题】:Failed to parse JSON using Biquery from Google storage bucket 【发布时间】:2020-11-24 20:01:39 【问题描述】:

我从我的后端将附加的 JSON 上传到 Google 存储桶, 现在我正在尝试将此 JSON 连接到 Bigquery 表,但出现以下错误,我需要进行哪些更改?

读取表时出错:XXXXX,错误消息:解析 JSON 失败:启动新数组时未找到对象。;开始数组返回假;解析器在字符串结束前终止

[["video_screen","click_on_screen","false","202011231958","1","43","0"],["buy","error","2","202011231807","1","6","0"],["sign_in","enter","user_details","202011231220","2","4","0"],["video_screen","click_on_screen","false","202011230213","1","4","0"],["video_screen","click_on_screen","false","202011230633","1","4","0"],["video_screen","click_on_screen","false","202011230709","1","4","0"],["video_screen","click_on_screen","false","202011230712","1","4","0"],["video_screen","click_on_screen","false","202011230723","1","4","0"],["video_screen","click_on_screen","false","202011230725","1","4","0"],["video_screen","click_on_screen","false","202011231739","1","4","0"],["category","select","MTV","202011232228","1","3","0"],["sign_in","enter","user_details","202011230108","2","3","0"],["sign_in","enter","user_details","202011230442","2","3","0"],["video","select","youtube","202011230108","1","3","0"],["video","select","youtube","202011230633","1","3","0"],["video_screen","click_on_screen","false","202011230458","1","3","0"],["video_screen","click_on_screen","false","202011230552","1","3","0"],["video_screen","click_on_screen","false","202011230612","1","3","0"],["video_screen","click_on_screen","false","202011231740","1","3","0"],["category","select","Disney Karaoke","202011232228","1","2","0"],["category","select","Duet","202011232228","1","2","0"],["category","select","Free","202011230726","1","2","0"],["category","select","Free","202011231830","2","2","0"],["category","select","Free","202011232228","1","2","0"],["category","select","Love","202011232228","1","2","0"],["category","select","New","202011232228","1","2","0"],["category","select","Pitch Perfect 2","202011232228","1","2","0"],["developer","click","hithub","202011230749","1","2","0"],["sign_in","enter","user_details","202011230134","1","2","0"],["sign_in","enter","user_details","202011230211","1","2","0"],["sign_in","enter","user_details","202011230219","1","2","0"]]

【问题讨论】:

【参考方案1】:

Bigquery 读取 JSONL 文件。这个例子是不是的格式。

    JSONL 使用\n 作为记录之间的分隔符。该示例全部在一行中,以逗号分隔。 每个 JSONL 行都是一个 json 对象,因此以 开头并以 结尾。该示例包含不受支持的 JSON 数组。 JSONL 基于JSON。每个数据元素都需要命名。所以第一条记录可能显示为 "field1_name": "video_screen", "field2_name": "click_on_screen", "field3_name": false, "field4_name": 202011231958, "Field5_name": 1, "field6_name": 43, "field7_name": 0 JSONL 没有外括号[]。第一行以 而非[ 开头,最后一行以 而非] 结尾。

【讨论】:

好的,那么您知道如何将这些数据从 bigbucket 插入到 bigquery 吗? @idan 该文件将不得不由 bigquery 以外的其他工具打开。您可以用 bigquery 理解的格式重写文件,也可以打开文件并使用编程语言中的 bigquery API 直接写入 bigquery。

以上是关于无法使用 Google 存储桶中的 Biquery 解析 JSON的主要内容,如果未能解决你的问题,请参考以下文章

根据文件名将 Google Cloud 存储桶中的多个文件复制到不同的目录

无法从 google-colaboratory 打开 google-storage 中的文件

无法使用 Pyspark 2.4.4 读取 s3 存储桶中的镶木地板文件

无法访问保存在 Javascript 代码中的存储桶中的图像

如何在Python中更有效地审计GCP存储桶中的数千个对象

对存储桶中的文件更改调用应用程序 API