python简单爬虫
Posted lxh777
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了python简单爬虫相关的知识,希望对你有一定的参考价值。
from urllib import request,parse from urllib.error import HTTPError,URLError
def get(url,headers = None):
return urlrequest(url,headers=headers)
def post(url,form,headers=None):
return urlrequest(url,form,headers=headers)
def urlrequest(url,form = None,headers = None): user_agent = ‘Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36‘ if headers == None: headers = {
‘User-Agent‘:user_agent } html_bytes = b‘‘ try: if form:
#POST
#转换成字符串
form_str = parse.urlencode(form)
#转换成bytes
html_bytes = form_str.encode(‘utf-8‘)
req = request.Request(url,data=form_bytes)
else:
#GET
#Request
req = request.Request(url,headers = headers)
#添加 response = request.urlopen(req,timeout = 5) html_bytes = reponse.read() except HTTPError as e:
print(e)
except URLError as e:
print(e)
return html_bytes if __name__==‘__main__‘:
#post
#url = ‘http://fanyi.baidu.com/sug‘
#form = {
# ‘kw‘:‘鹰‘
#}
#html_bytes = post(url,form=form)
#print(html_bytes)
url = ‘http://www.baidu.com‘
html_bytes = get(url) print(html_bytes.decode(‘utf-8‘))
以上是关于python简单爬虫的主要内容,如果未能解决你的问题,请参考以下文章
爬虫遇到头疼的验证码?Python实战讲解弹窗处理和验证码识别
Python练习册 第 0013 题: 用 Python 写一个爬图片的程序,爬 这个链接里的日本妹子图片 :-),(http://tieba.baidu.com/p/2166231880)(代码片段