python selenium，找到下载完成后？

Posted 2021-04-02

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了python selenium，找到下载完成后？相关的知识，希望对你有一定的参考价值。

我用selenium开始下载。下载完成后，需要采取某些措施，是否有任何简单的方法可以找出下载完成的时间？（我正在使用FireFox驱动程序）

答案

没有内置的selenium方式等待下载完成。

这里的一般想法是等到文件出现在“下载”目录中。

这可能是通过一遍又一遍地检查文件是否存在来实现的：

Check and wait until a file exists to read it

或者，通过使用像watchdog这样的东西来监视目录：

另一答案

我最近遇到了这个问题。我一次下载多个文件，如果下载失败，必须以超时方式构建。

代码每秒检查一些下载目录中的文件名，并在完成后退出，或者如果完成时间超过20秒则退出。返回的下载时间用于检查下载是否成功或是否超时。

import time
import os

def download_wait(path_to_downloads):
    seconds = 0
    dl_wait = True
    while dl_wait and seconds < 20:
        time.sleep(1)
        dl_wait = False
        for fname in os.listdir(path_to_downloads):
            if fname.endswith('.crdownload'):
                dl_wait = True
        seconds += 1
    return seconds

我相信这只适用于chrome文件，因为它们以.crdownload扩展名结尾。在其他浏览器中可能有类似的方法来检查。

编辑：我最近改变了我使用此函数的方式，.crdownload没有出现作为扩展名。基本上这只是等待正确数量的文件。

def download_wait(directory, timeout, nfiles=None):
    """
    Wait for downloads to finish with a specified timeout.

    Args
    ----
    directory : str
        The path to the folder where the files will be downloaded.
    timeout : int
        How many seconds to wait until timing out.
    nfiles : int, defaults to None
        If provided, also wait for the expected number of files.

    """
    seconds = 0
    dl_wait = True
    while dl_wait and seconds < timeout:
        time.sleep(1)
        dl_wait = False
        files = os.listdir(directory)
        if nfiles and len(files) != nfiles:
            dl_wait = True

        for fname in directory:
            if fname.endswith('.crdownload'):
                dl_wait = True

        seconds += 1
    return seconds

另一答案

我知道答案已经太晚了，但是想为未来的读者分享一个黑客。

您可以从主线程创建一个线程说thread1并在此处开始下载。现在，创建另一个线程，比如thread2，在那里，让它等到thread1使用join（）方法完成。现在，你可以在下载完成后继续你的执行流程。

仍然确保你不使用selenium开始下载，而是使用selenium提取链接并使用请求模块下载。

Download using requests module

例如：

def downloadit():
     #download code here    

def after_dwn():
     dwn_thread.join()           #waits till thread1 has completed executing
     #next chunk of code after download goes here goes here

dwn_thread = threading.Thread(target=downloadit)
dwn_thread.start()

metadata_thread = threading.Thread(target=after_dwn)
metadata_thread.start()

另一答案

使用Chrome时，尚未完成下载的文件的扩展名为.crdownload。如果您正确使用set your download directory，那么您可以等到您想要的文件不再具有此扩展名。原则上，这与等待文件存在（如suggested by alecxe）没什么不同 - 但至少你可以用这种方式监视进度。

另一答案

x1=0
while x1==0:
    count=0
    li = os.listdir("directorypath")
    for x1 in li:
        if x1.endswith(".crdownload"):
             count = count+1        
    if count==0:
        x1=1
    else:
        x1=0

如果您要检查一组文件（多个）是否已完成下载，则此方法有效。

另一答案

如前所述，没有本地方法可以检查下载是否完成。所以这里有一个帮助功能，可以完成Firefox和Chrome的工作。一个技巧是在开始新的下载之前清除临时下载文件夹。此外，使用本机pathlib进行跨平台使用。

from pathlib import Path

def is_download_finished(temp_folder):
    firefox_temp_file = sorted(Path(temp_folder).glob('*.part'))
    chrome_temp_file = sorted(Path(temp_folder).glob('*.crdownload'))
    downloaded_files = sorted(Path(temp_folder).glob('*.*'))
    if (len(firefox_temp_file) == 0) and 
       (len(chrome_temp_file) == 0) and 
       (len(downloaded_files) >= 1):
        return True
    else:
        return False

以上是关于python selenium，找到下载完成后？的主要内容，如果未能解决你的问题，请参考以下文章