Python中用于JSON提要的cURL方法[重复]
Posted
技术标签:
【中文标题】Python中用于JSON提要的cURL方法[重复]【英文标题】:cURL method in Python for JSON feed [duplicate] 【发布时间】:2015-10-04 07:18:38 【问题描述】:在构建烧瓶网站时,我使用外部 JSON 提要向本地 mongoDB 提供内容。在将 JSON 中的键重新用于 Mongo 中的键时,会解析和馈送此提要。
Feed 中的一个可用键称为“img_url”,它包含(猜猜是什么)图像的 url。
有没有办法在 Python 中模仿 php 样式的 cURL?我想获取该密钥,下载图像并将其存储在本地某处,同时保留其他关联的密钥,并将其作为我的数据库的条目。
这是我到目前为止的脚本:
import json
import sys
import urllib2
from datetime import datetime
import pymongo
import pytz
from utils import slugify
# from utils import logger
client = pymongo.MongoClient()
db = client.artlogic
def fetch_artworks():
# logger.debug("downloading artwork data from Artlogic")
AL_artworks = []
AL_artists = []
url = "http://feeds.artlogic.net/artworks/artlogiconline/json/"
while True:
f = urllib2.urlopen(url)
data = json.load(f)
AL_artworks += data['rows']
# logger.debug("retrieved page %s of %s of artwork data" % (data['feed_data']['page'], data['feed_data']['no_of_pages']))
# Stop we are at the last page
if data['feed_data']['page'] == data['feed_data']['no_of_pages']:
break
url = data['feed_data']['next_page_link']
# Now we have a list called ‘artworks’ in which all the descriptions are stored
# We are going to put them into the mongoDB database,
# Making sure that if the artwork is already encoded (an object with the same id
# already is in the database) we update the existing description instead of
# inserting a new one (‘upsert’).
# logger.debug("updating local mongodb database with %s entries" % len(artworks))
for artwork in AL_artworks:
# Mongo does not like keys that have a dot in their name,
# this property does not seem to be used anyway so let us
# delete it:
if 'artworks.description2' in artwork:
del artwork['artworks.description2']
# upsert int the database:
db.AL_artworks.update("id": artwork['id'], artwork, upsert=True)
# artwork['artist_id'] is not functioning properly
db.AL_artists.update("artist": artwork['artist'],
"artist_sort": artwork['artist_sort'],
"artist": artwork['artist'],
"slug": slugify(artwork['artist']),
upsert=True)
# db.meta.update("subject": "artworks", "updated": datetime.now(pytz.utc), "subject": "artworks", upsert=True)
return AL_artworks
if __name__ == "__main__":
fetch_artworks()
【问题讨论】:
【参考方案1】:首先,您可能喜欢requests 库。
否则,如果你想坚持使用标准库,它将是:
def fetchfile(url, dst):
fi = urllib2.urlopen(url)
fo = open(dst, 'wb')
while True:
chunk = fi.read(4096)
if not chunk: break
fo.write(chunk)
fetchfile(
data['feed_data']['next_page_link'],
os.path.join('/var/www/static', uuid.uuid1().get_hex()
)
使用正确的异常捕获(如果您愿意,我可以开发,但我相信文档会足够清晰)。
您可以将fetchfile()
放入异步作业的pool 中以一次获取多个文件。
【讨论】:
以上是关于Python中用于JSON提要的cURL方法[重复]的主要内容,如果未能解决你的问题,请参考以下文章
用于读取 RSS 和 ATOM 提要的 java 库 [重复]