去哪儿网酒店爬虫
Posted zwp-627
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了去哪儿网酒店爬虫相关的知识,希望对你有一定的参考价值。
获取去哪儿网酒店数据
URL = ‘https://hotel.qunar.com/napi/list‘
data = {
"b": {
"bizVersion": "17",
"cityUrl": "beijing_city",
"fromDate": "2020-03-07",
"toDate": "2020-03-08",
"q": "",
"qFrom": 3,
"start": 20,
"num": 20,
"minPrice": 0,
"maxPrice": -1,
"level": "",
"sort": 0,
"cityType": 1,
"fromForLog": 1,
"uuid": "",
"userName": "",
"userId": "",
"fromAction": "",
"searchType": 0,
"locationAreaFilter": [],
"comprehensiveFilter": []
},
"qrt": "h_hlist",
"source": "website"
}
headers = {
‘authority‘: ‘hotel.qunar.com‘,
‘pragma‘: ‘no-cache‘,
‘cache-control‘: ‘no-cache‘,
‘accept‘: ‘application/json, text/plain, */*‘,
‘sec-fetch-dest‘: ‘empty‘,
‘user-agent‘: ‘Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (Khtml, like Gecko) Chrome/80.0.3987.132 Safari/537.36‘,
‘content-type‘: ‘application/json;charset=UTF-8‘,
‘origin‘: ‘https://hotel.qunar.com‘,
‘sec-fetch-site‘: ‘same-origin‘,
‘sec-fetch-mode‘: ‘cors‘,
‘referer‘: ‘https://hotel.qunar.com/cn/beijing_city/?fromDate=2020-03-06&toDate=2020-03-07&cityName=%E5%8C%97%E4%BA%AC‘,
‘accept-language‘: ‘zh-CN,zh;q=0.9,en;q=0.8‘,
}
resp = requests.post(URL, headers=headers, data=data)
这样请求返回400.
研究发现,是因为data数据需要是字符串型才可以。
data = json.dumps(data)
这样返回200
headers中referer 与 content-type 必须有。
以上是关于去哪儿网酒店爬虫的主要内容,如果未能解决你的问题,请参考以下文章