python3 爬虫入门
Posted liwuming
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了python3 爬虫入门相关的知识,希望对你有一定的参考价值。
import urllib.request; import urllib.parse; url = "http://www.iciba.com/publish"; headers = { "Host" : "www.iciba.com", "User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0", "Accept" : "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "Accept-Language" : "zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2", #"Accept-Encoding" : "gzip, deflate" }; request = urllib.request.Request(url=url,headers=headers); response = urllib.request.urlopen(request); print(response.read().decode());
报错:
UnicodeDecodeError: ‘utf-8‘ codec can‘t decode byte 0x8b in position 1: invalid start byte
【解决之道】没有进行解压缩处理
import urllib.request; import urllib.parse; import gzip; url = "https://www.baidu.com"; headers = { "Host" : "www.baidu.com", "User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0", "Accept" : "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "Accept-Language" : "zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2", "Accept-Encoding" : "gzip, deflate" }; request = urllib.request.Request(url=url,headers=headers); response = urllib.request.urlopen(request); content = response.read(); ‘‘‘ 获取响应信息 ‘‘‘ encoding = response.info().get("Content-Encoding"); if(encoding == "gzip"): print(gzip.decompress(content).decode());
以上是关于python3 爬虫入门的主要内容,如果未能解决你的问题,请参考以下文章