如何将我的 Python 爬虫输出保存到 JSON 文件?
Posted
技术标签:
【中文标题】如何将我的 Python 爬虫输出保存到 JSON 文件?【英文标题】:How do i save my Python crawler output to a JSON file? 【发布时间】:2015-01-18 16:09:51 【问题描述】:我最近开始编写和学习 Python,目前正在开发一个网络爬虫。所以它目前只是打印出搜索结果。我想要的是它将数据保存到 JSON 文件中。
import requests
import json
from bs4 import BeautifulSoup
url= "http://www.alternate.nl/html/product/listing.html?navId=11622&tk=7&lk=9419"
r = requests.get(url)
soup = BeautifulSoup(r.content)
g_data = soup.find_all("div", "class": "listRow")
for item in g_data:
try:
print item.find_all("span", "class": "name")[0].text#1
print item.find_all("span", "class": "additional")[0].text#2
print item.find_all("span", "class": "info")[0].text#3
print item.find_all("span", "class": "info")[1].text#4
print item.find_all("span", "class": "info")[2].text#5
print item.find_all("span", "class": "price right right10")[0].text#6
except:
pass
这是我希望它返回的内容:
"product1":["1":"itemfindallresults1","2":"itemfindallresults2"] etc
那我该怎么做呢? 提前致谢。
【问题讨论】:
先创建my_data = "product1":[ ... ]
,接下来使用json.dump(my_data, ...)
【参考方案1】:
一个简单的 JSON 用法是:
import json
# open the file "filename" in write ("w") mode
file = open("filename", "w")
# just an example dictionary to be dumped into "filename"
output = "stuff": [1, 2, 3]
# dumps "output" encoded in the JSON format into "filename"
json.dump(output, file)
file.close()
希望这会有所帮助。
【讨论】:
【参考方案2】:满足您要求的简单程序。
import requests
import json
from bs4 import BeautifulSoup
url= "http://www.alternate.nl/html/product/listing.html?navId=11622&tk=7&lk=9419"
r = requests.get(url)
soup = BeautifulSoup(r.content)
product = Product()
g_data = soup.find_all("div", "class": "listRow")
for item in g_data:
try:
product.set_<field_name>(item.find_all("span", "class": "name")[0].text)
product.set_<field_name>("span", "class": "additional")[0].text
product.set_<field_name>("span", "class": "info")[0].text
product.set_<field_name>("span", "class": "info")[1].text
product.set_<field_name>("span", "class": "info")[2].text
product.set_<field_name>("span", "class": "price right right10")[0].text
except:
pass
import json
file = open("filename", "w")
output = "product1": product
json.dump(output, file)
file.close()
【讨论】:
以上是关于如何将我的 Python 爬虫输出保存到 JSON 文件?的主要内容,如果未能解决你的问题,请参考以下文章
使用 Python 在运行时将 selenium 结果/输出保存在文本文件中