python spider.py

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了python spider.py相关的知识,希望对你有一定的参考价值。

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import requests
from bs4 import BeautifulSoup

price_url = 'http://p.3.cn/prices/get?skuid=J_'
list_url = 'https://list.jd.com/list.html?cat=12259,12260,9438'

r = requests.get(list_url, stream=True)

# with open('wine.html', 'wb') as f:
# 	for chunk in r.iter_content(chunk_size=1024): 
#             if chunk:
#                 f.write(chunk)

soup = BeautifulSoup(r.text, 'html.parser')
plist = soup.find(id='plist').find_all('li', 'gl-item')

for item in plist:
	item_url = 'http' + item.find_all('a')[0]['href'].encode('utf-8')
	item_id = item_url.split('/')[-1][0:-5]
	item_title = item.find('div', 'p-name').get_text().encode('utf-8').strip()
	r = requests.get(price_url+item_id)
	price_data = r.json()
	print item_id
	print item_title
	print item_url
	print price_data

以上是关于python spider.py的主要内容,如果未能解决你的问题,请参考以下文章

python scrapy runspider mangafox_spider.py

python爬虫—爬取百度百科数据

python新手代码是啥?

Spider爬虫

按顺序运行Multiple Spider

scrapy 如何导入设置以覆盖它