由于 Cloudflare，从今天开始无法使用 BeautifulSoup 解析 coin gecko 页面

Posted 2023-02-19

技术标签:

【中文标题】由于 Cloudflare，从今天开始无法使用 BeautifulSoup 解析 coin gecko 页面【英文标题】：Can't parse coin gecko page from today with BeautifulSoup because of Cloudflare 【发布时间】：2021-10-08 11:52:53 【问题描述】：

from bs4 import BeautifulSoup as bs
import requests
import re
import cloudscraper

def get_btc_price(br):
  data=requests.get('https://www.coingecko.com/en/coins/bitcoin')

  soup = bs(data.text, 'html.parser')

  price1=soup.find('table','class':'table b-b')
  fclas=price1.find('td')

  spans=fclas.find('span')

  price2=spans.text
  price=(price2).strip()
  x=float(price[1:])    
  y=x*br
  z=round(y,2)
  print(z)

  return z

这已经工作了几个月，今天早上它决定停止。我收到的消息如下：在您继续之前检查您的浏览器......、检查您的防病毒软件或咨询经理以获得访问权限......以及一些 cloudflare 胡言乱语。

我试过了

import cloudscraper

scraper = cloudscraper.create_scraper()  # returns a CloudScraper instance
print(scraper.get("https://www.coingecko.com/en/coins/bitcoin").text)

它仍然阻止我访问。我该怎么办？有没有其他方法可以绕过这个或者我做错了什么。

【问题讨论】：

【参考方案1】：

在处理连接协商时，似乎不是爬虫的问题，而是服务器的问题。

添加用户代理，否则requests使用默认值

user_agent = #
response = requests.get(url, headers= "user-agent": user_agent)

检查“要求”

url = #
response = requests.get(url)
for key, value in response.headers.items():
  print(key, ":", value)

【讨论】：

嗯，很有趣。添加用户代理后，它现在可以工作了。我现在什至不需要 cloudcraper。谢谢！

以上是关于由于 Cloudflare，从今天开始无法使用 BeautifulSoup 解析 coin gecko 页面的主要内容，如果未能解决你的问题，请参考以下文章