Python中用于JSON提要的cURL方法[重复]

Posted

技术标签:

【中文标题】Python中用于JSON提要的cURL方法[重复]【英文标题】:cURL method in Python for JSON feed [duplicate] 【发布时间】:2015-10-04 07:18:38 【问题描述】:

在构建烧瓶网站时,我使用外部 JSON 提要向本地 mongoDB 提供内容。在将 JSON 中的键重新用于 Mongo 中的键时,会解析和馈送此提要。

Feed 中的一个可用键称为“img_url”,它包含(猜猜是什么)图像的 url。

有没有办法在 Python 中模仿 php 样式的 cURL?我想获取该密钥,下载图像并将其存储在本地某处,同时保留其他关联的密钥,并将其作为我的数据库的条目。

这是我到目前为止的脚本:

    import json
    import sys
    import urllib2
    from datetime import datetime

    import pymongo
    import pytz

    from utils import slugify
    # from utils import logger

    client = pymongo.MongoClient()
    db = client.artlogic

    def fetch_artworks():
    # logger.debug("downloading artwork data from Artlogic")

AL_artworks = []
AL_artists = []
url = "http://feeds.artlogic.net/artworks/artlogiconline/json/"

while True:
    f = urllib2.urlopen(url)
    data = json.load(f)

    AL_artworks += data['rows']

    # logger.debug("retrieved page %s of %s of artwork data" % (data['feed_data']['page'], data['feed_data']['no_of_pages']))

    # Stop we are at the last page
    if data['feed_data']['page'] == data['feed_data']['no_of_pages']:
        break

    url = data['feed_data']['next_page_link']

# Now we have a list called ‘artworks’ in which all the descriptions are stored
# We are going to put them into the mongoDB database,
# Making sure that if the artwork is already encoded (an object with the same id
# already is in the database) we update the existing description instead of
# inserting a new one (‘upsert’).

# logger.debug("updating local mongodb database with %s entries" % len(artworks))

for artwork in AL_artworks:
    # Mongo does not like keys that have a dot in their name,
    # this property does not seem to be used anyway so let us
    # delete it:
    if 'artworks.description2' in artwork:
        del artwork['artworks.description2']
    # upsert int the database:
    db.AL_artworks.update("id": artwork['id'], artwork, upsert=True)


    # artwork['artist_id'] is not functioning properly
    db.AL_artists.update("artist": artwork['artist'],
                      "artist_sort": artwork['artist_sort'],
                       "artist":  artwork['artist'],
                       "slug": slugify(artwork['artist']),
                      upsert=True)

# db.meta.update("subject": "artworks", "updated": datetime.now(pytz.utc), "subject": "artworks", upsert=True)
return AL_artworks

    if __name__ == "__main__":
        fetch_artworks()

【问题讨论】:

【参考方案1】:

首先,您可能喜欢requests 库。

否则,如果你想坚持使用标准库,它将是:

def fetchfile(url, dst):
    fi = urllib2.urlopen(url)
    fo = open(dst, 'wb')
    while True:
        chunk = fi.read(4096)
        if not chunk: break
        fo.write(chunk)


fetchfile(
    data['feed_data']['next_page_link'],
    os.path.join('/var/www/static', uuid.uuid1().get_hex()
)

使用正确的异常捕获(如果您愿意,我可以开发,但我相信文档会足够清晰)。

您可以将fetchfile() 放入异步作业的pool 中以一次获取多个文件。

https://docs.python.org/2/library/json.html https://docs.python.org/2/library/urllib2.html https://docs.python.org/2/library/tempfile.html https://docs.python.org/2/library/multiprocessing.html

【讨论】:

以上是关于Python中用于JSON提要的cURL方法[重复]的主要内容,如果未能解决你的问题,请参考以下文章

用于读取 RSS 和 ATOM 提要的 java 库 [重复]

用于打开 json 提要中的所有 url 的 Javascript window.open 函数

从shell变量运行带有JSON内容的curl命令[重复]

CURL 变量和解析 JSON 的问题 [重复]

text 用于将JSON提要解析为RowList的代码示例

需要帮助获取 JSON 数据最后一项 [重复]