python 爬虫学习第四课
Posted helenandyoyo
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了python 爬虫学习第四课相关的知识,希望对你有一定的参考价值。
python 爬虫学习之PyQuery库
PyQuery练习一
#===========Pyquery练习一===================
html = '''
<div id="container">
<ul class="list">
<li class="item-0">first item</li>
<li class="item-1"><a href="link2.html">second item</a></li>
<li class="item-0 active"><a href="link3.html"><span class="bold">third item</span></a></li>
<li class="item-1 active"><a href="link4.html">fourth item</a></li>
<li class="item-0"><a href="link5.html">fifth item</a></li>
</ul>
</div>
'''
from pyquery import PyQuery as pq
doc = pq(url='http://www.baidu.com',encoding='utf-8')
print(doc)
print(type(doc))
print(doc('head'))
doc = pq(html)
print(doc('#container .list li'))
print(doc('li'))
items = doc('.list')
print(type(items))
print(items)
lis = items.find('li')
print(lis)
lis2 = items.children()
print(lis2)
li2 = items.children('.active')
print(li2)
PyQuery练习二
#===========Pyquery练习二===================
#爬取豆瓣读书(https://book.douban.com/)信息
import requests
from pyquery import PyQuery as pq
response = requests.get('https://book.douban.com/')
doc = pq(response.content)
lis = doc('.info ').items()
book_items = []
for li in lis:
item =
item['title'] = li('.title a').text()
item['link'] = li('.title a').attr('href')
if not li('div.author').text() is None:
item['author'] = "".join(li('div.author').text().split())
else:
item['author'] = li('div.author').text()
book_items.append(item)
print(book_items)
以上是关于python 爬虫学习第四课的主要内容,如果未能解决你的问题,请参考以下文章