无法使用 python cloudcraper 绕过 cloudflare

Posted 2023-02-23

技术标签:

【中文标题】无法使用 python cloudcraper 绕过 cloudflare【英文标题】：Can't bypass cloudflare with python cloudscraper 【发布时间】：2021-04-12 17:13:38 【问题描述】：

我在尝试解析网站时遇到了 cloudflare 问题。

我得到了这个代码

import cloudscraper

url = "https://author.today"
scraper = cloudscraper.create_scraper()
print(scraper.post(url).status_code)

这段代码打印我

cloudscraper.exceptions.CloudflareChallengeError: Detected a Cloudflare version 2 challenge, This feature is not available in the opensource (free) version.

我搜索了解决方法，但找不到任何解决方案。如果通过浏览器访问该网站，您可以看到

Checking your browser before accessing author.today.

在我的情况下，有没有绕过 cloudflare 的解决方案？

【问题讨论】：

异常信息暗示解决方案。 not available in the opensource (free) version - 所以付钱吧。显然有“无付费版本”。但是文档指出：

Cloudflare modifies their anti-bot protection page occasionally, So far it has changed maybe once per year on average.  If you notice that the anti-bot page has changed, or if this module suddenly stops working, please create a GitHub issue so that I can update the code accordingly.

。它突然停止了对我的工作，所以我认为他们改变了策略有趣的是，即使我使用相同的 IP 从 curl 复制 chrome 请求并重新发送它（带有所有 cookie），它似乎也无法欺骗 CloudFlare。我想知道为什么会这样，以及当它们都发出相同的请求时，cloudflare 如何将我的浏览器与 cURL 区分开来。（注意，复制请求标头的方法，曾经可以工作……但现在已经不行了……）异常确实包含提示。但我没有找到任何非免费版本。 【参考方案1】：

你可以使用 playwright webkit

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.webkit.launch()
    page = browser.new_page()
    page.goto("https://author.today")
    page.wait_for_timeout(10000)
    print(page.content())
    browser.close()

【讨论】：

【参考方案2】：

虽然对于这个站点似乎不起作用，但有时在初始化刮板时添加一些参数会有所帮助：

import cloudscraper

url = "https://author.today"
scraper = cloudscraper.create_scraper(
    browser=
        'browser': 'chrome',
        'platform': 'android',
        'desktop': False
    
)
print(scraper.post(url).status_code)

【讨论】：

以上是关于无法使用 python cloudcraper 绕过 cloudflare的主要内容，如果未能解决你的问题，请参考以下文章

13系统6s到x移除云空间无法载入隐藏绕

java调用python的惨痛史(无法获取环境变量)

Python学习之循环--绕圈圈（蛇形盘）

记一次Fuzz绕WAF实现SQL 注入

❤️Python文件操作保姆式教程❤️，计算机那么多文件，你绕不过这一关的！