python spider.py
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了python spider.py相关的知识,希望对你有一定的参考价值。
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import requests
from bs4 import BeautifulSoup
price_url = 'http://p.3.cn/prices/get?skuid=J_'
list_url = 'https://list.jd.com/list.html?cat=12259,12260,9438'
r = requests.get(list_url, stream=True)
# with open('wine.html', 'wb') as f:
# for chunk in r.iter_content(chunk_size=1024):
# if chunk:
# f.write(chunk)
soup = BeautifulSoup(r.text, 'html.parser')
plist = soup.find(id='plist').find_all('li', 'gl-item')
for item in plist:
item_url = 'http' + item.find_all('a')[0]['href'].encode('utf-8')
item_id = item_url.split('/')[-1][0:-5]
item_title = item.find('div', 'p-name').get_text().encode('utf-8').strip()
r = requests.get(price_url+item_id)
price_data = r.json()
print item_id
print item_title
print item_url
print price_data
以上是关于python spider.py的主要内容,如果未能解决你的问题,请参考以下文章
python scrapy runspider mangafox_spider.py
python爬虫—爬取百度百科数据
python新手代码是啥?
Spider爬虫
按顺序运行Multiple Spider
scrapy 如何导入设置以覆盖它