使用 mechanize 绕过 404

Posted 2023-02-23

技术标签:

【中文标题】使用 mechanize 绕过 404【英文标题】：Get around a 404 with mechanize 【发布时间】：2012-11-12 13:17:42 【问题描述】：

我正在创建一个可以读取 URL 文件的 Python 脚本，但我知道并非所有这些都可以工作。我试图弄清楚如何解决这个问题并让它读取文件的下一行，而不是引发我在下面发布的错误。我知道我需要某种 if 语句，但我不太明白。

from mechanize import Browser
from BeautifulSoup import BeautifulSoup
import csv

me = open('C:\Python27\myfile.csv')
reader = csv.reader(me)
mech = Browser()

for url in me:
    response =  mech.open(url)
    html = page.read()
    soup = BeautifulSoup(html)
    table = soup.find("table", border=3)

for row in table.findAll('tr')[2:]:
    col = row.findAll('td')
    BusinessName = col[0].string
    Phone = col[1].string
    Address = col[2].string
    City = col[3].string
    State = col[4].string
    Zip = col[5].string
    Restaurantinfo = (BusinessName, Phone, Address, City, State)
    print "|".join(Restaurantinfo)

当我运行该代码块时，它会引发此错误：

httperror_seek_wrapper：HTTP 错误 404：未找到

基本上我要求的是如何让 Python 忽略它并尝试下一个 URL。

【问题讨论】：

【参考方案1】：

如果您的文件中只有 url，那么每行编写一个 url 并使用如下代码会更简单：

from mechanize import Browser
from BeautifulSoup import BeautifulSoup


me = open('C:\Python27\myfile.csv')
mech = Browser()

for url in me.readlines():
    ...

如果你想保留你的代码，你必须使用：

for url in reader:
    ...

【讨论】：

以上是关于使用 mechanize 绕过 404的主要内容，如果未能解决你的问题，请参考以下文章

无法使用 Ruby Mechanize 登录亚马逊

如何使用 Mechanize 处理 JavaScript？

使用 Mechanize 进行抓取，遇到 HTTP 错误 403

Python使用mechanize模拟浏览器

性能测试框架Multi-Mechanize安装与使用

为啥在 Windows 上使用 Mechanize 访问 SSL 站点会失败，但在 Mac 上可以工作？