python简单爬虫

Posted 2020-12-29 lxh777

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了python简单爬虫相关的知识，希望对你有一定的参考价值。

from urllib import request,parse
from urllib.error import HTTPError,URLError

def get(url,headers = None):
　　return urlrequest(url,headers=headers)
def post(url,form,headers=None):
　　return urlrequest(url,form,headers=headers)

def urlrequest(url,form = None,headers = None):
　　user_agent = ‘Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36‘
　　if headers == None:
　　　　headers = {
　　　　　　‘User-Agent‘:user_agent
　　　　}
    html_bytes = b‘‘
    try:
　　　　if form:
　　　　　　#POST
　　　　　　#转换成字符串
　　　　　　form_str = parse.urlencode(form)
　　　　　　#转换成bytes
　　　　　　html_bytes = form_str.encode(‘utf-8‘)
　　　　　　req = request.Request(url,data=form_bytes)
　　　　else:
　　　　　　#GET
　　　　　　#Request
　　　　　　req = request.Request(url,headers = headers)
　　
    　　#添加
　　　　response = request.urlopen(req,timeout = 5)
　　　　html_bytes = reponse.read()
　　except HTTPError as e:
　　　　print(e)
　　except URLError as e:
　　　　print(e)
　　return html_bytes
if __name__==‘__main__‘:
　　#post
　　#url = ‘http://fanyi.baidu.com/sug‘
　　#form = {
　　#　　‘kw‘:‘鹰‘　　　　
　　#}
　　#html_bytes = post(url,form=form)
　　#print(html_bytes)
　　

　　url = ‘http://www.baidu.com‘

    html_bytes = get(url)
    print(html_bytes.decode(‘utf-8‘))

以上是关于python简单爬虫的主要内容，如果未能解决你的问题，请参考以下文章

爬虫遇到头疼的验证码？Python实战讲解弹窗处理和验证码识别

Python练习册第 0013 题：用 Python 写一个爬图片的程序，爬这个链接里的日本妹子图片 :-)，(http://tieba.baidu.com/p/2166231880)(代码片段

[Python]python爬虫简单试用

5行python代码实现简单的网络爬虫

如何用最简单的Python爬虫采集整个网站

python 用于在终端中运行的sublime text 3的简单代码片段制作工具