Python的请求库超时但从浏览器获得响应
Posted
技术标签:
【中文标题】Python的请求库超时但从浏览器获得响应【英文标题】:Python's requests library timing out but getting the response from the browser 【发布时间】:2018-04-02 10:17:55 【问题描述】:我正在尝试为 NBA 数据创建一个网络 scraper。当我运行以下代码时:
import requests
response = requests.get('https://stats.nba.com/stats/leaguedashplayerstats?College=&Conference=&Country=&DateFrom=10%2F20%2F2017&DateTo=10%2F20%2F2017&Division=&DraftPick=&DraftYear=&GameScope=&GameSegment=&Height=&LastNGames=0&LeagueID=00&Location=&MeasureType=Base&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PaceAdjust=N&PerMode=Totals&Period=0&PlayerExperience=&PlayerPosition=&PlusMinus=N&Rank=N&Season=2017-18&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&StarterBench=&TeamID=0&VsConference=&VsDivision=&Weight=')
请求因错误而超时:
文件 "C:\ProgramData\Anaconda3\lib\site-packages\requests\api.py", 第 70 行,在获取 返回请求('get', url, params=params, **kwargs)
文件 "C:\ProgramData\Anaconda3\lib\site-packages\requests\api.py", 第 56 行,应要求提供 return session.request(method=method, url=url, **kwargs)
文件 "C:\ProgramData\Anaconda3\lib\site-packages\requests\sessions.py", 第 488 行,应要求提供 resp = self.send(prep, **send_kwargs)
文件 "C:\ProgramData\Anaconda3\lib\site-packages\requests\sessions.py", 第 609 行,在发送中 r = adapter.send(request, **kwargs)
文件 "C:\ProgramData\Anaconda3\lib\site-packages\requests\adapters.py", 第 473 行,在发送中 引发 ConnectionError(err, request=request)
ConnectionError: ('连接中止。', OSError("(10060, 'WSAETIMEDOUT')",))
但是,当我在浏览器中点击相同的 URL 时,我会收到响应。
【问题讨论】:
【参考方案1】:看起来您提到的网站正在检查请求标头中的"User-Agent"
。您可以在请求中伪造"User-Agent"
,使其看起来像是来自实际浏览器,并且您会收到响应。
例如:
import requests
url = "https://stats.nba.com/stats/leaguedashplayerstats?College=&Conference=&Country=&DateFrom=10%2F20%2F2017&DateTo=10%2F20%2F2017&Division=&DraftPick=&DraftYear=&GameScope=&GameSegment=&Height=&LastNGames=0&LeagueID=00&Location=&MeasureType=Base&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PaceAdjust=N&PerMode=Totals&Period=0&PlayerExperience=&PlayerPosition=&PlusMinus=N&Rank=N&Season=2017-18&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&StarterBench=&TeamID=0&VsConference=&VsDivision=&Weight="
headers = 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (Khtml, like Gecko) Chrome/61.0.3163.100 Safari/537.36'
# it's the user-agent of my browser ^
response = requests.get(url, headers=headers)
response.status_code # will return: 200
response.text # will return the website content
您可以从here找到您浏览器的用户代理。
【讨论】:
轰隆隆!你可以找到自己的User-Agent
值here【参考方案2】:
如果仍然无法正常工作,请使用此标头:
headers = 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36','Accept-Encoding': 'gzip, deflate, br','Accept-Language': 'en-US,en;q=0.9,hi;q=0.8'
【讨论】:
【参考方案3】:如果其他标题不起作用,试试这个 HEADER ,它对我来说效果很好。
headers = "User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.1.1 Safari/605.1.15","Accept-Language": "en-gb","Accept-Encoding":"br, gzip, deflate","Accept":"test/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8","Referer":"http://www.google.com/"
从this link收集这些标头
【讨论】:
以上是关于Python的请求库超时但从浏览器获得响应的主要内容,如果未能解决你的问题,请参考以下文章