无法使用 Google 存储桶中的 Biquery 解析 JSON
Posted
技术标签:
【中文标题】无法使用 Google 存储桶中的 Biquery 解析 JSON【英文标题】:Failed to parse JSON using Biquery from Google storage bucket 【发布时间】:2020-11-24 20:01:39 【问题描述】:我从我的后端将附加的 JSON 上传到 Google 存储桶, 现在我正在尝试将此 JSON 连接到 Bigquery 表,但出现以下错误,我需要进行哪些更改?
读取表时出错:XXXXX,错误消息:解析 JSON 失败:启动新数组时未找到对象。;开始数组返回假;解析器在字符串结束前终止
[["video_screen","click_on_screen","false","202011231958","1","43","0"],["buy","error","2","202011231807","1","6","0"],["sign_in","enter","user_details","202011231220","2","4","0"],["video_screen","click_on_screen","false","202011230213","1","4","0"],["video_screen","click_on_screen","false","202011230633","1","4","0"],["video_screen","click_on_screen","false","202011230709","1","4","0"],["video_screen","click_on_screen","false","202011230712","1","4","0"],["video_screen","click_on_screen","false","202011230723","1","4","0"],["video_screen","click_on_screen","false","202011230725","1","4","0"],["video_screen","click_on_screen","false","202011231739","1","4","0"],["category","select","MTV","202011232228","1","3","0"],["sign_in","enter","user_details","202011230108","2","3","0"],["sign_in","enter","user_details","202011230442","2","3","0"],["video","select","youtube","202011230108","1","3","0"],["video","select","youtube","202011230633","1","3","0"],["video_screen","click_on_screen","false","202011230458","1","3","0"],["video_screen","click_on_screen","false","202011230552","1","3","0"],["video_screen","click_on_screen","false","202011230612","1","3","0"],["video_screen","click_on_screen","false","202011231740","1","3","0"],["category","select","Disney Karaoke","202011232228","1","2","0"],["category","select","Duet","202011232228","1","2","0"],["category","select","Free","202011230726","1","2","0"],["category","select","Free","202011231830","2","2","0"],["category","select","Free","202011232228","1","2","0"],["category","select","Love","202011232228","1","2","0"],["category","select","New","202011232228","1","2","0"],["category","select","Pitch Perfect 2","202011232228","1","2","0"],["developer","click","hithub","202011230749","1","2","0"],["sign_in","enter","user_details","202011230134","1","2","0"],["sign_in","enter","user_details","202011230211","1","2","0"],["sign_in","enter","user_details","202011230219","1","2","0"]]
【问题讨论】:
【参考方案1】:Bigquery 读取 JSONL 文件。这个例子是不是的格式。
-
JSONL 使用
\n
作为记录之间的分隔符。该示例全部在一行中,以逗号分隔。
每个 JSONL 行都是一个 json 对象,因此以
开头并以
结尾。该示例包含不受支持的 JSON 数组。
JSONL 基于JSON。每个数据元素都需要命名。所以第一条记录可能显示为 "field1_name": "video_screen", "field2_name": "click_on_screen", "field3_name": false, "field4_name": 202011231958, "Field5_name": 1, "field6_name": 43, "field7_name": 0
JSONL 没有外括号[]
。第一行以
而非[
开头,最后一行以
而非]
结尾。
【讨论】:
好的,那么您知道如何将这些数据从 bigbucket 插入到 bigquery 吗? @idan 该文件将不得不由 bigquery 以外的其他工具打开。您可以用 bigquery 理解的格式重写文件,也可以打开文件并使用编程语言中的 bigquery API 直接写入 bigquery。以上是关于无法使用 Google 存储桶中的 Biquery 解析 JSON的主要内容,如果未能解决你的问题,请参考以下文章
根据文件名将 Google Cloud 存储桶中的多个文件复制到不同的目录
无法从 google-colaboratory 打开 google-storage 中的文件
无法使用 Pyspark 2.4.4 读取 s3 存储桶中的镶木地板文件