使用 mongoimport 解析 JSON 失败
Posted
技术标签:
【中文标题】使用 mongoimport 解析 JSON 失败【英文标题】:Failure parsing JSON with mongoimport 【发布时间】:2011-06-19 23:35:28 【问题描述】:我收到 Assertion: 10340:Failure parsing JSON string
错误,通过 Github API 在管道中运行 mongoimport,如下所示:
lsoave@ubuntu:~/rails/github/gitwatcher$ curl https://api.github.com/users/lgs/repos | mongoimport -h localhost -d gitwatch_dev -c repo -f repositories
connected to: localhost
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0Mon Jun 20 00:56:01 Assertion: 10340:Failure parsing JSON string near: [
100 22303 100 22303 0 0 31104 0 --:--:-- --:--:-- --:--:-- 111k
0x816d8a1 0x8118814 0x84b357a 0x84b5bb8 0x84adc65 0x84b2ee1 0x60bbd6 0x80f5bc1
mongoimport(_ZN5mongo11msgassertedEiPKc+0x221) [0x816d8a1]
mongoimport(_ZN5mongo8fromjsonEPKcPi+0x3b4) [0x8118814]
mongoimport(_ZN6Import9parseLineEPc+0x7a) [0x84b357a]
mongoimport(_ZN6Import3runEv+0x1a98) [0x84b5bb8]
mongoimport(_ZN5mongo4Tool4mainEiPPc+0x1ce5) [0x84adc65]
mongoimport(main+0x51) [0x84b2ee1]
/lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe6) [0x60bbd6]
mongoimport(__gxx_personality_v0+0x3f1) [0x80f5bc1]
exception:Failure parsing JSON string near: [
[
...
...
Mon Jun 20 00:45:20 Assertion: 10340:Failure parsing JSON string near: "name": "t
0x816d8a1 0x8118814 0x84b357a 0x84b5bb8 0x84adc65 0x84b2ee1 0x126bd6 0x80f5bc1
mongoimport(_ZN5mongo11msgassertedEiPKc+0x221) [0x816d8a1]
mongoimport(_ZN5mongo8fromjsonEPKcPi+0x3b4) [0x8118814]
mongoimport(_ZN6Import9parseLineEPc+0x7a) [0x84b357a]
mongoimport(_ZN6Import3runEv+0x1a98) [0x84b5bb8]
mongoimport(_ZN5mongo4Tool4mainEiPPc+0x1ce5) [0x84adc65]
mongoimport(main+0x51) [0x84b2ee1]
/lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe6) [0x126bd6]
mongoimport(__gxx_personality_v0+0x3f1) [0x80f5bc1]
exception:Failure parsing JSON string near: "name": "t
"name": "tentacles"
...
...
在此处查看完整跟踪:http://pastie.org/2093486。无论如何,我从 Github API 得到的 json 格式似乎没问题( curl https://api.github.com/users/lgs/repos ):
[
"open_issues": 0,
"watchers": 3,
"homepage": "http://scrubyt.org",
"language": null,
"forks": 1,
"pushed_at": "2009-02-25T22:49:08Z",
"created_at": "2009-02-25T22:22:40Z",
"fork": true,
"url": "https://api.github.com/repos/lgs/scrubyt",
"private": false,
"size": 188,
"description": "A simple to learn and use, yet powerful web scraping toolkit!",
"owner":
"avatar_url": "https://secure.gravatar.com/avatar/9c7d80ebc20ab8994e51b9f7518909ae?d=https://a248.e.akamai.net/assets.github.com%2Fimages%2Fgravatars%2
Fgravatar-140.png",
"login": "lgs",
"url": "https://api.github.com/users/lgs",
"id": 1573
,
"name": "scrubyt",
"html_url": "https://github.com/lgs/scrubyt"
,
...
...
]
这是一个 sn-p:http://www.pastie.org/2093524。
如果我尝试指定 csv 格式,它会起作用:
lsoave@ubuntu:~/rails/github/gitwatcher$ curl https://api.github.com/users/lgs/repos | mongoimport -h localhost -d gitwatch_dev -c repo -f repositories --type csv
connected to: localhost
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 22303 100 22303 0 0 23914 0 --:--:-- --:--:-- --:--:-- 106k
imported 640 objects
lsoave@ubuntu:~/rails/github/gitwatcher$
【问题讨论】:
不要使用像 pastebin 这样的不稳定服务来引用来自 SO 帖子的代码/数据。 ... 易变? ...粘贴箱?以前的代码链接既不是 pastebin 也不是 volatile ;-) 感谢您的投票! 我可以在 1.8.1 64b 上复制它。调查可能发生的事情。我将输出粘贴到 JSON 验证器中,并且 JSON 是有效的。 【参考方案1】:使用“mongoimport --jsonArray ...
”对我有用
【讨论】:
谢谢!在敲了一个小时后为什么它说我的 JSON 无效。 是的,效果非常好,即使那里有换行符。 感谢也为我工作!我用这个: mongoimport --jsonArray -h好吧,这就是可能发生的事情。首先,我删除了 JSON 中的所有换行符,以将错误数从 n(其中 n = 行数)减少到 1。然后事实证明,我必须将 JSON 数组包装在另一个变量中,然后它才能工作。我认为mongoimport
旨在与mongoexport
一起使用,因此您很可能无法使用它来导入任意JSON。但是,如果您愿意,我所做的将是您在调用导入实用程序之前必须在代码中执行的操作。
我在测试时只使用了 1 条记录。这是没有换行符的记录。
["url":"https://api.github.com/repos/lgs/scrubyt", "pushed_at": "2009-02-25T22:49:08Z","homepage": "http://scrubyt.org", "forks": 1,"language": null,"fork": true,"html_url": "https://github.com/lgs/scrubyt","created_at": "2009-02-25T22:22:40Z", "open_issues": 0,"private": false,"size": 188,"watchers": 3,"owner": "url": "https://api.github.com/users/lgs","login": "lgs","id": 1573,"avatar_url": "https://secure.gravatar.com/avatar/9c7d80ebc20ab8994e51b9f7518909ae?d=https://a248.e.akamai.net/assets.github.com%2Fimages%2Fgravatars%2Fgravatar-140.png","name": "scrubyt","description": "A simple to learn and use, yet powerful web scraping toolkit!"]
然后我用somedata
包裹它(你可以在这里使用任何名字):
somedata:["url":"https://api.github.com/repos/lgs/scrubyt", "pushed_at": "2009-02-25T22:49:08Z","homepage": "http://scrubyt.org", "forks": 1,"language": null,"fork": true,"html_url": "https://github.com/lgs/scrubyt","created_at": "2009-02-25T22:22:40Z", "open_issues": 0,"private": false,"size": 188,"watchers": 3,"owner": "url": "https://api.github.com/users/lgs","login": "lgs","id": 1573,"avatar_url": "https://secure.gravatar.com/avatar/9c7d80ebc20ab8994e51b9f7518909ae?d=https://a248.e.akamai.net/assets.github.com%2Fimages%2Fgravatars%2Fgravatar-140.png","name": "scrubyt","description": "A simple to learn and use, yet powerful web scraping toolkit!"]
我可以在 Mongo 中看到记录。
> db.repo.findOne()
"_id" : ObjectId("4dff91d29c73f72483e82ef2"),
"somedata" : [
"url" : "https://api.github.com/repos/lgs/scrubyt",
"pushed_at" : "2009-02-25T22:49:08Z",
"homepage" : "http://scrubyt.org",
"forks" : 1,
"language" : null,
"fork" : true,
"html_url" : "https://github.com/lgs/scrubyt",
"created_at" : "2009-02-25T22:22:40Z",
"open_issues" : 0,
"private" : false,
"size" : 188,
"watchers" : 3,
"owner" :
"url" : "https://api.github.com/users/lgs",
"login" : "lgs",
"id" : 1573,
"avatar_url" : "https://secure.gravatar.com/avatar/9c7d80ebc20ab8994e51b9f7518909ae?d=https://a248.e.akamai.net/assets.github.com%2Fimages%2Fgravatars%2Fgravatar-140.png"
,
"name" : "scrubyt",
"description" : "A simple to learn and use, yet powerful web scraping toolkit!"
]
希望这会有所帮助!
【讨论】:
【参考方案3】:删除任何“\n”后,这对我来说效果很好。你可以在linux中使用tr 猫文件.json | tr -d '\n' > file.json
【讨论】:
【参考方案4】:使用@Daniel 和@lobster1234 提供的答案,我创建了一个脚本,用于将json 条目导入mongo。
#!/bin/sh
if [ -z "$1" ] ;
then
echo "missing argument"
exit -1
fi
FILE=$1%%.json
echo $FILE
cat $FILE.json | tr -d '\n' > $FILE.import.json
mongoimport --collection collection --db main --file $FILE.import.json --jsonArray --upsert
【讨论】:
以上是关于使用 mongoimport 解析 JSON 失败的主要内容,如果未能解决你的问题,请参考以下文章