从 URL 下载返回的 Zip 文件

Posted 2023-02-16

技术标签:

【中文标题】从 URL 下载返回的 Zip 文件【英文标题】：Download Returned Zip file from URL 【发布时间】：2012-03-14 05:15:34 【问题描述】：

如果我有一个 URL，当我在网络浏览器中提交时，会弹出一个对话框来保存一个 zip 文件，我将如何在 Python 中捕获和下载这个 zip 文件？

【问题讨论】：

我尝试了this page 的下载二进制文件并将其写入磁盘部分，该部分用作chram。 【参考方案1】：

将 .zip 文件保存到磁盘位置的超轻量级解决方案（使用 Python 3.9）：

import requests

url = r'https://linktofile'
output = r'C:\pathtofolder\downloaded_file.zip'

r = requests.get(url)
with open(output, 'wb') as f:
    f.write(r.content)

【讨论】：

***.com/questions/68524210/… @AtomStore 是吗？我的回答有问题吗？如何绕过警报，它下载的是html文件而不是zip 我的答案适用于我测试过的链接。尝试使用我的代码，但将 url 替换为：api.os.uk/downloads/v1/products/CodePointOpen/…（来自 Ordnance Survey 的开放数据）【参考方案2】：

使用`requests, zipfile and io` python 包。

特别是 BytesIO 函数用于将解压后的文件保存在内存中，而不是保存到驱动器中。

import requests
from zipfile import ZipFile
from io import BytesIO

r = requests.get(zip_file_url)
z = ZipFile(BytesIO(r.content))    
file = z.extract(a_file_to_extract, path_to_save)
with open(file) as f:
    print(f.read())

【讨论】：

【参考方案3】：

据我所知，正确的做法是：

import requests, zipfile, StringIO
r = requests.get(zip_file_url, stream=True)
z = zipfile.ZipFile(StringIO.StringIO(r.content))
z.extractall()

当然，您希望使用r.ok 检查 GET 是否成功。

对于 python 3+，使用 io 模块子 StringIO 模块并使用 BytesIO 代替 StringIO：Here 是提及此更改的发行说明。

import requests, zipfile, io
r = requests.get(zip_file_url)
z = zipfile.ZipFile(io.BytesIO(r.content))
z.extractall("/path/to/destination_directory")

【讨论】：

感谢您的回答。我用它解决了my issue getting a zip file with requests。 yoavram，在您的代码中-我在哪里输入网页的网址？如果您想将下载的文件保存在其他位置，请将z.extractall() 替换为z.extractall("/path/to/destination_directory") 如果你只是想从 url 保存文件，你可以这样做：urllib.request.urlretrieve(url, filename)。为了帮助其他人把我花了 60 分钟太久才弄明白的点连接起来，然后您可以将pd.read_table(z.open('filename')) 与上述内容一起使用。如果您有一个包含多个文件的 zip url 链接并且您只对加载一个文件感兴趣，这很有用。【参考方案4】：

大多数人建议使用requests（如果可用），requests documentation 建议使用此方法从 URL 下载和保存原始数据：

import requests 

def download_url(url, save_path, chunk_size=128):
    r = requests.get(url, stream=True)
    with open(save_path, 'wb') as fd:
        for chunk in r.iter_content(chunk_size=chunk_size):
            fd.write(chunk)

由于答案询问有关下载和保存 zip 文件的问题，因此我没有详细介绍有关阅读 zip 文件的信息。有关可能性，请参阅以下众多答案之一。

如果由于某种原因您无法访问requests，您可以改用urllib.request。它可能不如上面的那么健壮。

import urllib.request

def download_url(url, save_path):
    with urllib.request.urlopen(url) as dl_file:
        with open(save_path, 'wb') as out_file:
            out_file.write(dl_file.read())

最后，如果你还在使用 Python 2，你可以使用urllib2.urlopen。

from contextlib import closing

def download_url(url, save_path):
    with closing(urllib2.urlopen(url)) as dl_file:
        with open(save_path, 'wb') as out_file:
            out_file.write(dl_file.read())

【讨论】：

能否请您也添加示例 sn-p。你这样做真是太好了【参考方案5】：

我来这里是为了寻找如何保存 .bzip2 文件。让我将代码粘贴给其他可能会来寻找它的人。

url = "http://api.mywebsite.com"
filename = "swateek.tar.gz"

response = requests.get(url, headers=headers, auth=('myusername', 'mypassword'), timeout=50)
if response.status_code == 200:
with open(filename, 'wb') as f:
   f.write(response.content)

我只是想按原样保存文件。

【讨论】：

【参考方案6】：

感谢@yoavram 提供上述解决方案，我的 url 路径链接到压缩的 文件夹，并遇到 BADZipfile 错误（文件不是zip文件），试了几次就奇怪了检索网址并突然解压缩，所以我稍微修改了解决方案少量。根据here 使用 is_zipfile 方法

r = requests.get(url, stream =True)
check = zipfile.is_zipfile(io.BytesIO(r.content))
while not check:
    r = requests.get(url, stream =True)
    check = zipfile.is_zipfile(io.BytesIO(r.content))
else:
    z = zipfile.ZipFile(io.BytesIO(r.content))
    z.extractall()

【讨论】：

【参考方案7】：

在this blog post 的帮助下，我只用requests 就可以了。奇怪的stream 事情的重点是，我们不需要在大型请求上调用content，这需要一次处理所有请求，从而阻塞内存。 stream 通过一次遍历数据块来避免这种情况。

url = 'https://www2.census.gov/geo/tiger/GENZ2017/shp/cb_2017_02_tract_500k.zip'
target_path = 'alaska.zip'

response = requests.get(url, stream=True)
handle = open(target_path, "wb")
for chunk in response.iter_content(chunk_size=512):
    if chunk:  # filter out keep-alive new chunks
        handle.write(chunk)
handle.close()

【讨论】：

答案的大部分内容不应依赖链接。链接可能会失效，或者可以更改另一侧的内容以不再回答问题。请编辑您的答案，以包含您链接指向的信息的摘要或解释。这里的chunk_size 是什么？而这个参数会不会影响下载速度？ @ayushthakur 以下是一些可能有帮助的链接：requests.Response.iter_content 和 wikipedia:Chunk Transfer Encoding。其他人可能会给出更好的答案，但如果chunk_size 设置得足够大（降低#pings/内容比率），我不希望它对下载速度产生影响。回想起来，512 字节似乎超级小。【参考方案8】：

以下是我在 Python 3 中的工作内容：

import zipfile, urllib.request, shutil

url = 'http://www....myzipfile.zip'
file_name = 'myzip.zip'

with urllib.request.urlopen(url) as response, open(file_name, 'wb') as out_file:
    shutil.copyfileobj(response, out_file)
    with zipfile.ZipFile(file_name) as zf:
        zf.extractall()

【讨论】：

你好。如何避免这个错误：urllib.error.HTTPError: HTTP Error 302: The HTTP server returned a redirect error that would lead to an infinite loop.? @VictorHerasmePerez，HTTP 302 响应状态码表示页面已被移动。我认为您面临的问题在这里得到解决：***.com/questions/32569934/… @Webucator 如果压缩文件夹包含多个文件，那么所有这些文件将被提取并存储在系统中。我想从压缩文件夹中提取并只获取一个文件。有什么办法可以做到这一点？【参考方案9】：

要么使用 urllib2.urlopen，要么您可以尝试使用出色的 Requests 模块并避免 urllib2 头痛：

import requests
results = requests.get('url')
#pass results.content onto secondary processing...

【讨论】：

但是如何将 results.content 解析为 zip？使用zipfile 模块：zip = zipfile.ZipFile(results.content)。然后只需使用ZipFile.namelist()、ZipFile.open() 或ZipFile.extractall() 解析文件

以上是关于从 URL 下载返回的 Zip 文件的主要内容，如果未能解决你的问题，请参考以下文章

从 URL 下载返回的 Zip 文件

使用requests, zipfile and io python 包。

使用`requests, zipfile and io` python 包。