当我试图通过 Selenium 获取 Airbnb 的请求时，Pythonanywhere 返回 BadStatusLine

Posted 2023-03-13

技术标签:

【中文标题】当我试图通过 Selenium 获取 Airbnb 的请求时，Pythonanywhere 返回 BadStatusLine【英文标题】：Pythonanywhere returning BadStatusLine when I am trying to get request of Airbnb through Selenium 【发布时间】：2018-04-17 22:35:14 【问题描述】：

这是我的代码。

import requests
from bs4 import BeautifulSoup
import time
import datetime
from selenium import webdriver
import io
from pyvirtualdisplay import Display


neighbours = []

with io.open('cntr_london.txt', "r", encoding="utf-8") as f:
        for q in f:
            neighbours.append(q.replace('neighborhoods%5B%5D=', '').replace('\n',''))

#url = 'https://www.airbnb.com/s/paris/homes?room_types%5B%5D=Entire%20home%2Fapt&room_types%5B%5D=Private%20room&price_max=' +str(price_max)+ '&price_min=' + str(price_min)

def scroll_through_bottom():
        s = 0
        while s <= 4000:
            s = s+200
            browser.execute_script('window.scrollTo(0, '+ str(s) +');')


def get_links():
    link_data = browser.find_elements_by_class_name('_1szwzht')
    for link in link_data:
        link_tag = link.find_elements_by_tag_name('a')
        for l in link_tag:
            link_list.append(l.get_attribute("href"))

    length = len(link_list)
    print length 


with Display():

    browser = webdriver.Firefox()

    try:
        browser.get('http://airbnb.com')
    finally:
        browser.quit()

每个网址都在工作。但是当我试图获得 Airbnb 时，它给了我这个错误：

Traceback (most recent call last):
  File "airbnb_new.py", line 43, in <module>
    browser.get('http://airbnb.com')
  File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 248, in get
    self.execute(Command.GET, 'url': url)
  File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 234, in execute
    response = self.command_executor.execute(driver_command, params)
  File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/remote_connection.py", line 401, in execute
    return self._request(command_info[0], url, body=data)
  File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/remote_connection.py", line 433, in _request
    resp = self._conn.getresponse()
  File "/usr/lib/python2.7/httplib.py", line 1089, in getresponse
    response.begin()
  File "/usr/lib/python2.7/httplib.py", line 444, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python2.7/httplib.py", line 408, in _read_status
    raise BadStatusLine(line)
httplib.BadStatusLine: ''

另一方面，当我尝试在 Python3 中运行我的代码时，它给了我no module named pyvirtualdisplay，即使我使用 pip 安装了它。

有人可以帮我解决这个问题吗？我将不胜感激。

【问题讨论】：

为什么不尝试捕捉异常，看看airbnb的响应是什么等？你能告诉我怎样才能看到正在发生的事情吗？ 【参考方案1】：

Airbnb 已识别出您的 scraper，作为预防措施，他们将拒绝来自您的蜘蛛的请求。所以你不能做任何事情，你可以更改 IP 和系统信息并检查它是否有效，或者你可以等待几个小时，然后检查 airbnb 系统是否已释放锁定并接受来自系统的请求。

【讨论】：

我可以更改Pythonanywhere中的系统信息吗？还是因为旧版本的 Firefox 而发生？可以换系统和IP吗？在 Firefox 中，我可以更改代理。我还需要更改用户代理吗？ @MaheshKaria 你能帮我理解你是如何得出结论的吗Airbnb has identified your scrapper？ Airbnb 没有识别出我的爬虫。我在 facebook、google 和任何其他网站上都收到了回复，但它不适用于 Airbnb。经过研究，我明白了，firefox驱动程序版本可能是原因。它太旧了，版本是 17。另一方面，没有办法升级 firefox。那我现在该怎么办！

以上是关于当我试图通过 Selenium 获取 Airbnb 的请求时，Pythonanywhere 返回 BadStatusLine的主要内容，如果未能解决你的问题，请参考以下文章