python简单爬虫

Posted lxh777

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了python简单爬虫相关的知识,希望对你有一定的参考价值。

from urllib import request,parse
from urllib.error import HTTPError,URLError

def get(url,headers = None):
  return urlrequest(url,headers=headers)
def post(url,form,headers=None):
  return urlrequest(url,form,headers=headers)
def urlrequest(url,form = None,headers = None):   user_agent = Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36   if headers == None:     headers = {
      
User-Agent:user_agent     } html_bytes = b‘‘ try:     if form:
      
#POST
      #转换成字符串
      form_str = parse.urlencode(form)
      #转换成bytes
      html_bytes = form_str.encode(‘utf-8‘)
      req = request.Request(url,data=form_bytes)
    else:
      #GET
      #Request
      req = request.Request(url,headers = headers)
     #添加     response = request.urlopen(req,timeout = 5)     html_bytes = reponse.read()   except HTTPError as e:
    print(e)
  except URLError as e:
    print(e)
  return html_bytes
if __name__==__main__:
  #post
  #url = ‘http://fanyi.baidu.com/sug‘
  #form = {
  #  ‘kw‘:‘鹰‘    
  #}
  #html_bytes = post(url,form=form)
  #print(html_bytes)
  

  url = ‘http://www.baidu.com‘
html_bytes
= get(url) print(html_bytes.decode(utf-8))
































以上是关于python简单爬虫的主要内容,如果未能解决你的问题,请参考以下文章

爬虫遇到头疼的验证码?Python实战讲解弹窗处理和验证码识别

Python练习册 第 0013 题: 用 Python 写一个爬图片的程序,爬 这个链接里的日本妹子图片 :-),(http://tieba.baidu.com/p/2166231880)(代码片段

[Python]python爬虫简单试用

5行python代码实现简单的网络爬虫

如何用最简单的Python爬虫采集整个网站

python 用于在终端中运行的sublime text 3的简单代码片段制作工具