使用 UI 将 JSON 加载到 Bigquery 时出错

Posted

技术标签:

【中文标题】使用 UI 将 JSON 加载到 Bigquery 时出错【英文标题】:Getting error while loading JSON into Bigquery using UI 【发布时间】:2018-09-03 00:07:35 【问题描述】:

我必须将 JSON 加载到 Bigquery 中。根据 Bigquery 文档,我以正确的格式制作了我的 JSON,即每行用一个 JSON 对象分隔的换行符。

现在,JSON 文件有大约 1000 万行,加载时出现以下错误:

读取数据时出错,报错信息:JSON表遇到太多错误,放弃。行数:2165;错误: 1. 请查看错误流以获取更多详细信息。

当我找到第 2165 行时,它看起来如下:

"deviceId":"13231fd01222a28e","dow":"Wednesday","downloadFlag":"N","email":"clstone898@gmail.com","emailSha256":"1bdf11821f867799bde022ccb57a2e899f827c988b4275571ffd60279c863272","event":"streamStop","firebaseUID":"UDVC3hyQpBWLCnlhXhjAQBeI95Q2","halfHourFull":"08h1","liveFlag":"Y","localDate":"2018-02-07","localHalfHour":1,"login":"google","minutesSinceMidnight":497,"quarterHourFull":"08q2","stationName":"Fox hit 101.9","streamListenMethod":"BluetoothA2DPOutput","timestampLocal":"2018-02-07T08:017:04.679+11:00","timestampUTC":"2018-02-06T21:17:04.679Z"

当我加载这一行时,它会成功加载。请在这里指导/建议不正确的地方。

我正在使用模式自动检测选项从 Bigquery UI 加载此 json。

样本记录如下:

"deviceId":"3c7a345dafcff93f","dow":"Tuesday","downloadFlag":"N","email":"psloper.ps@gmail.com","emailSha256":"1cebae8c35db32edcd35e746863fc65a04ac68f2f5b3350f2df477a86bfaa07d","event":"streamStop","firebaseUID":"AMFYYjsvZjauhCktJ5lUzZj0d3D2","halfHourFull":"21h2","liveFlag":"Y","localDate":"2018-02-06","localHalfHour":2,"login":"google","minutesSinceMidnight":1311,"quarterHourFull":"21q4","stationName":"hit 105","streamListenMethod":"Headphones","timestampLocal":"2018-02-06T21:51:40.216+10:00","timestampUTC":"2018-02-06T11:51:40.216Z"
"deviceId":"2f1a8c84c738b752","dow":"Wednesday","downloadFlag":"N","email":"kory.maxwell@icloud.com","emailSha256":"13348786c15bff95e4afb4968a9bdbe883b70206a737c02c89fc8215f2a4e101","event":"streamStop","facebookId":"1784054201892593","firebaseUID":"Tx1bHjP6dhaDB2nl2c7yi2KZHsq2","halfHourFull":"06h1","liveFlag":"Y","localDate":"2018-02-07","localHalfHour":1,"login":"facebook","minutesSinceMidnight":384,"quarterHourFull":"06q2","stationName":"hit 105","streamListenMethod":"BluetoothA2DPOutput","timestampLocal":"2018-02-07T06:24:44.533+10:00","timestampUTC":"2018-02-06T20:24:44.533Z"
"deviceId":"AA1D685F-6BF6-B0DC-0000-000000000000","dow":"Wednesday","email":"lozza073@bigpond.com","emailSha256":"525db286e9a35c9f9f55db0ce338762eee02c51955ede6b35afb7e808581664f","event":"streamStart","facebookId":"10215879897177171","firebaseUID":"f2efT61sW5gHTfgEbtNfyaUKWaF3","halfHourFull":"7h2","liveFlag":"Y","localDate":"2018-02-07","localHalfHour":2,"login":"facebook","minutesSinceMidnight":463,"quarterHourFull":"7q3","stationName":"Fox hit 101.9","streamListenMethod":"Speaker","timestampLocal":"2018-02-07T07:43:00.39+11:00","timestampUTC":"2018-02-06T20:43:00.39Z"
"deviceId":"AEFD39FC-B116-4063-0000-000000000000","dow":"Wednesday","event":"catchUpPause","facebookId":"379907925802180","firebaseUID":"vQPh9tbO3Yge88fpMyNUFzJO7dl1","halfHourFull":"7h2","liveFlag":"N","localDate":"2018-02-07","localHalfHour":2,"login":"facebook","minutesSinceMidnight":465,"quarterHourFull":"7q4","stationName":"Fox hit 101.9","streamListenMethod":"USBAudio","timestampLocal":"2018-02-07T07:45:08.524+11:00","timestampUTC":"2018-02-06T20:45:08.524Z"
"deviceId":"AA1D685F-6BF6-B0DC-0000-000000000000","dow":"Wednesday","email":"lozza073@bigpond.com","emailSha256":"525db286e9a35c9f9f55db0ce338762eee02c51955ede6b35afb7e808581664f","event":"streamStop","facebookId":"10215879897177171","firebaseUID":"f2efT61sW5gHTfgEbtNfyaUKWaF3","halfHourFull":"7h2","liveFlag":"Y","localDate":"2018-02-07","localHalfHour":2,"login":"facebook","minutesSinceMidnight":475,"quarterHourFull":"7q4","stationName":"Fox hit 101.9","streamListenMethod":"Speaker","timestampLocal":"2018-02-07T07:55:35.788+11:00","timestampUTC":"2018-02-06T20:55:35.788Z"
"deviceId":"AA1D685F-6BF6-B0DC-0000-000000000000","dow":"Wednesday","email":"lozza073@bigpond.com","emailSha256":"525db286e9a35c9f9f55db0ce338762eee02c51955ede6b35afb7e808581664f","event":"streamStart","facebookId":"10215879897177171","firebaseUID":"f2efT61sW5gHTfgEbtNfyaUKWaF3","halfHourFull":"7h2","liveFlag":"Y","localDate":"2018-02-07","localHalfHour":2,"login":"facebook","minutesSinceMidnight":477,"quarterHourFull":"7q4","stationName":"Fox hit 101.9","streamListenMethod":"Speaker","timestampLocal":"2018-02-07T07:57:42.343+11:00","timestampUTC":"2018-02-06T20:57:42.343Z"
"deviceId":"13231fd01222a28e","dow":"Wednesday","downloadFlag":"N","email":"clstone898@gmail.com","emailSha256":"1bdf11821f867799bde022ccb57a2e899f827c988b4275571ffd60279c863272","event":"streamStop","firebaseUID":"UDVC3hyQpBWLCnlhXhjAQBeI95Q2","halfHourFull":"08h1","liveFlag":"Y","localDate":"2018-02-07","localHalfHour":1,"login":"google","minutesSinceMidnight":497,"quarterHourFull":"08q2","stationName":"Fox hit 101.9","streamListenMethod":"BluetoothA2DPOutput","timestampLocal":"2018-02-07T08:017:04.679+11:00","timestampUTC":"2018-02-06T21:17:04.679Z"

非常感谢任何帮助。

【问题讨论】:

这是您的第一个帮助:将您的 Json 放入块引用、代码块或剧透中,也可以通过格式化程序运行它 @Kwright02 因为这是一个 ndjson。当我在第 1 行通过格式化程序解析错误时遇到以下错误: ...-01T00:01:47.136Z""deviceId":"399a649 --------- -------------^ 期待 'EOF'、''、','、']',得到 '' 【参考方案1】:

好吧,看看那个特定的 2165 行:

"deviceId":"13231fd01222a28e","dow":"Wednesday","downloadFlag":"N","email":"clstone898@gmail.com","emailSha256":"1bdf11821f867799bde022ccb57a2e899f827c988b4275571ffd60279c863272","event":"streamStop","firebaseUID":"UDVC3hyQpBWLCnlhXhjAQBeI95Q2","halfHourFull":"08h1","liveFlag":"Y","localDate":"2018-02-07","localHalfHour":1,"login":"google","minutesSinceMidnight":497,"quarterHourFull":"08q2","stationName":"Fox hit 101.9","streamListenMethod":"BluetoothA2DPOutput","timestampLocal":"2018-02-07T08:017:04.679+11:00","timestampUTC":"2018-02-06T21:17:04.679Z"

特别是:

"timestampLocal":"2018-02-07T08:017:04.679+11:00"

还有错误信息:

无法将值转换为时间戳:无法解析 '2018-02-07T08:017:04.679+11:00' 作为时间戳。要求的格式是 YYYY-MM-DD HH:MM[:SS[.SSSSSS]]

因此,如果您将“T08:017:04.679”更改为“T08:17:04.679”(17 分钟而不是 017),那么它会起作用。 :)

【讨论】:

以上是关于使用 UI 将 JSON 加载到 Bigquery 时出错的主要内容,如果未能解决你的问题,请参考以下文章

使用控制台将多条 JSON 记录加载到 BigQuery

使用java将json数据流式传输到Bigquery中。不使用作业加载数据

无法使用 python 将 JSON 文件从谷歌云存储加载到 bigquery

使用空字典作为值将 JSON 文件加载到 BigQuery

使用 Dataflow 管道 (python) 将多个 Json zip 文件从 GCS 加载到 BigQuery

使用java.Without使用作业加载数据,将json数据流式传输到Bigquery