爬虫day 04_01(爬百度页面)

Posted 窃语

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了爬虫day 04_01(爬百度页面)相关的知识,希望对你有一定的参考价值。

import urllib.request
import http.cookiejar
from lxml import etree
head = {
    Connection: Keep-Alive,
    Accept: text/html, application/xhtml+xml, */*,
    Accept-Language: en-US,en;q=0.8,zh-Hans-CN;q=0.5,zh-Hans;q=0.3,
    User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; rv:11.0) like Gecko
}
# 给opener加上cookie
def makeMyOpener(head):
    cj = http.cookiejar.CookieJar()
    opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
    header = []
    for key, value in head.items():
        elem = (key, value)
        header.append(elem)
    opener.addheaders = header
    return opener
# 通过cookie 爬百度
oper=makeMyOpener(head)
url="https://www.baidu.com/s?ie=utf-8&f=3&rsv_bp=1&rsv_idx=1&tn=baidu&wd=python%20str%20%E8%BD%AC%20int&oq=python%2520str%2520%25E8%25BD%25AC%2520int&rsv_pq=c24aa0760000154b&rsv_t=c323uk7fLXupzfPqhHcqM%2F6l8k7Re4K90ZvzI33LDwW0kHYMiSED9rhKzCg&rqlang=cn&rsv_enter=0&prefixsug=python%2520str%2520%25E8%25BD%25AC%2520int&rsp=0"
uop=oper.open(url,timeout=1000)
data=uop.read()
html=data.decode();
print(html)

 

以上是关于爬虫day 04_01(爬百度页面)的主要内容,如果未能解决你的问题,请参考以下文章

Python爬虫爬取百度贴吧的帖子

爬虫爬取百度词条

Python练习册 第 0013 题: 用 Python 写一个爬图片的程序,爬 这个链接里的日本妹子图片 :-),(http://tieba.baidu.com/p/2166231880)(代码片段

百度鲜花图像爬取

百度热搜数据爬取及分析

Python简易爬虫爬取百度贴吧图片