import-im6.q16: 未授权错误 'os' @error/constitue.c/WriteImage/1037 用于 Python 网络爬虫

Posted

技术标签:

【中文标题】import-im6.q16: 未授权错误 \'os\' @error/constitue.c/WriteImage/1037 用于 Python 网络爬虫【英文标题】:import-im6.q16: not authorized error 'os' @ error/constitue.c/WriteImage/1037 for a Python web scraperimport-im6.q16: 未授权错误 'os' @error/constitue.c/WriteImage/1037 用于 Python 网络爬虫 【发布时间】:2019-08-28 17:28:13 【问题描述】:

我为漫画网站编写了一个简单的网络爬虫。我在 Ubuntu (Linux ubuntu 4.18.0-16-generic #17~18.04.1-Ubuntu) 上运行它,但是当我执行脚本时(权限设置为 chmod ug+x),我不断收到导入系统库的一系列错误以及令人困惑的语法错误:

import-im6.q16: not authorized `time' @ error/constitute.c/WriteImage/1037.
import-im6.q16: not authorized `os' @ error/constitute.c/WriteImage/1037.
import-im6.q16: not authorized `sys' @ error/constitute.c/WriteImage/1037.
import-im6.q16: not authorized `re' @ error/constitute.c/WriteImage/1037.
import-im6.q16: not authorized `requests' @ error/constitute.c/WriteImage/1037.
from: can't read /var/mail/bs4
./poorlywrittenscraper.py: line 15: DEFAULT_DIR_NAME: command not found
./poorlywrittenscraper.py: line 16: syntax error near unexpected token `('
./poorlywrittenscraper.py: line 16: `COMICS_DIRECTORY = os.path.join(os.getcwd(), DEFAULT_DIR_NAME)'

有趣的是,当我通过python3 运行相同的脚本时,它会启动,创建文件夹,获取图像,但......不会保存它们。 o.O

知道我在这里缺少什么或如何解决这个问题吗?

这是脚本的完整代码:

"""
A simple image downloader for poorlydrawnlines.com/archive
"""
import time
import os
import sys
import re
import concurrent.futures


import requests
from bs4 import BeautifulSoup as bs


DEFAULT_DIR_NAME = "poorly_created_folder"
COMICS_DIRECTORY = os.path.join(os.getcwd(), DEFAULT_DIR_NAME)


LOGO = """
a Python comic(al) scraper for poorlydwarnlines.com
                         __
.-----.-----.-----.----.|  |.--.--.
|  _  |  _  |  _  |   _||  ||  |  |
|   __|_____|_____|__|  |__||___  |
|__|                        |_____|
                __ __   __
.--.--.--.----.|__|  |_|  |_.-----.-----.
|  |  |  |   _||  |   _|   _|  -__|     |
|________|__|  |__|____|____|_____|__|__|
.-----.----.----.---.-.-----.-----.----.
|__ --|  __|   _|  _  |  _  |  -__|   _|
|_____|____|__| |___._|   __|_____|__|
                      |__|
version: 0.4 | author: baduker | https://github.com/baduker
"""


ARCHIVE_URL = "http://www.poorlydrawnlines.com/archive/"
COMIC_PATTERN = re.compile(r'http://www.poorlydrawnlines.com/comic/.+')

def download_comics_menu(comics_found):
    """
    Main download menu, takes number of available comics for download
    """
    print("\nThe scraper has found  comics.".format(len(comics_found)))
    print("How many comics do you want to download?")
    print("Type 0 to exit.")

    while True:
        try:
            comics_to_download = int(input(">> "))
        except ValueError:
            print("Error: expected a number. Try again.")
            continue
        if comics_to_download > len(comics_found) or comics_to_download < 0:
            print("Error: incorrect number of comics to download. Try again.")
            continue
        elif comics_to_download == 0:
            sys.exit()
        return comics_to_download


def grab_image_src_url(session, url):
    """
    Fetches urls with the comic image source
    """
    response = session.get(url)
    soup = bs(response.text, 'html.parser')
    for i in soup.find_all('p'):
        for img in i.find_all('img', src=True):
            return img['src']


def download_and_save_comic(session, url):
    """
    Downloads and saves the comic image
    """
    file_name = url.split('/')[-1]
    with open(os.path.join(COMICS_DIRECTORY, file_name), "wb") as file:
        response = session.get(url)
        file.write(response.content)


def fetch_comics_from_archive(session):
    """
    Grabs all urls from the poorlydrawnlines.com/archive and parses for only those that link to published comics
    """
    response = session.get(ARCHIVE_URL)
    soup = bs(response.text, 'html.parser')
    comics = [url.get("href") for url in soup.find_all("a")]
    return [url for url in comics if COMIC_PATTERN.match(url)]


def download_comic(session, url):
    """
    Download progress information
    """
    print("Downloading: ".format(url))
    url = grab_image_src_url(session, url)
    download_and_save_comic(session, url)


def main():
    """
    Encapsulates and executes all methods in the main function
    """
    print(LOGO)

    session = requests.Session()

    comics = fetch_comics_from_archive(session)
    comics_to_download = download_comics_menu(comics)

    try:
        os.mkdir(DEFAULT_DIR_NAME)
    except OSError as exc:
        sys.exit("Failed to create directory (error_no )".format(exc.error_no))

    start = time.time()
    with concurrent.futures.ThreadPoolExecutor() as executor:
        executor.map(lambda url: download_comic(session, url), comics[:comics_to_download])
    executor.shutdown()
    end = time.time()
    print("Finished downloading  comics in :.2f sec.".format(comics_to_download, end - start))

if __name__ in "__main__":
    main()

【问题讨论】:

【参考方案1】:

我很确定您在文件开头缺少一个 shebang,例如

#!/usr/bin/env python3
#!/usr/bin/env python2

【讨论】:

您对#! 的看法是正确的,但它部分解决了问题。我现在可以正常运行脚本,但由于某种原因,它仍然不会将文件写入磁盘,尽管它会下载它们。 我猜这部分with open(os.path.join(COMICS_DIRECTORY, file_name), "wb") as file: response = session.get(url) file.write(response.content)是你提到的那个不起作用?在这种情况下,我会尝试检查所有内容,这意味着检查文件是否已创建并正确打开,get 请求是否成功等等......另外,可能是一个显而易见的问题,但您有权在目标文件夹中写入? 是的,我确实有权执行脚本并写入任何文件夹。我会检查方法并回复您。感谢您的反馈!

以上是关于import-im6.q16: 未授权错误 'os' @error/constitue.c/WriteImage/1037 用于 Python 网络爬虫的主要内容,如果未能解决你的问题,请参考以下文章

未加载库:尝试使用 mysql2 gem 在 OS X 10.6 上运行“rails server”时出现 libmysqlclient.16.dylib 错误

GCSFuse 提供范围未授权错误

如果在 laravel 5.1 中未授权操作,则通过 ajax 显示错误

找不到-lMagick++-6.Q16

处理 axios react-redux 应用程序中的 401 未授权错误

命令失败,错误 13(未授权):'未授权