UnicodeDecodeError: ‘utf-8‘ codec can‘t decode byte 0x8b in position 1: invalid start byte

Posted 别呀

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了UnicodeDecodeError: ‘utf-8‘ codec can‘t decode byte 0x8b in position 1: invalid start byte相关的知识,希望对你有一定的参考价值。

当我们使用urllib库打印爬取的网页信息print(res.read().decode('utf-8'))出现:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

示例:

from urllib import request
url = 'https://image.baidu.com/search/index?tn=baiduimage&ipn=r&ct=201326592&cl=2&lm=-1&st=-1&fm=index&fr=&hs=0&xthttps=111110&sf=1&fmq=&pv=&ic=0&nc=1&z=&se=1&showtab=0&fb=0&width=&height=&face=0&istype=2&ie=utf-8&word=%E7%8B%97&oq=%E7%8B%97&rsp=-1'

headers = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'zh-CN,zh;q=0.9',
'Cache-Control': 'max-age=0',
'Connection': 'keep-alive',
'Cookie': 'BDqhfp=%E7%8B%97%26%260-10-1undefined%26%260%26%261; BIDUPSID=4B61D634D704A324E3C7E274BF11F280; PSTM=1624157516; BAIDUID=4B61D634D704A324C7EA5BA47BA5886E:FG=1; __yjs_duid=1_f7116f04cddf75093b9236654a2d70931624173362209; indexPageSugList=%5B%22%E7%8B%97%22%2C%22%E7%8C%AB%E5%92%AA%22%2C%22%E5%B0%8F%E9%80%8F%E6%98%8E%22%5D; BDORZ=B490B5EBF6F3CD402E515D22BCDA1598; BAIDUID_BFESS=5DD3805F1A4CC3C9562CEAC3C22A1408:FG=1; __yjs_st=2_YTMzN2ZlYWQwNjg5NTFlNGY4NTMxMDBhOTc0ZDQxZjYwZWI0NzBiNjU1N2UyOGRiY2MzNWQ4OTM2YjU4MGU4MmNjYTNiZTk4ZDFkMWE1YmU2ODZhNGMwYzQ3OGE1YjcxZjNmZTEzYWY2ZjNiNGYxNjc0NWNlYjY5YmRhMTI3MmI2N2ZjOTkyYWUwYTZlZDUyMzY3NTc3YmU0MWUwNGM3MDk5NWE1ZTRhNzE4NjQwYWJlMjE3OTg5YzdkYjc0NmE4MjBhMjA2MDBkZmIwNDhjMjYzZjYxMTcyOGM2OTZmYjRlOGUwNTc1N2ZhYWI5YzEwZTVkODg0ZjI4OWM2ZjcyZF83XzM0OWQ2ZTJh; H_PS_PSSID=34268_34099_33969_34222_31660_34226_33848_34113_34073_34107_26350_22159; delPer=0; PSINO=6; BA_HECTOR=al21a125ag2l25851j1genv370q; BDRCVFR[X_XKQks0S63]=mk3SLVN4HKm; firstShowTip=1; cleanHistoryStatus=0; BDRCVFR[dG2JNJb_ajR]=mk3SLVN4HKm; BDRCVFR[-pGxjrCMryR]=mk3SLVN4HKm; userFrom=null; ab_sr=1.0.1_NzczYjg1NGJiOWUwOGQwM2E4YTE0MDJkM2E0YjQ4M2E1ZDk0YWQ1MGUyMmNjZTg4NzhjZDNkZDI0YjcwMjU5N2MxYmQxNWIwZmRjMWEwZjVkNmZkYzkwYTNiYTE3NDUwYWFkZDkyZWM3Njg3ZjQ0OGQ5ZWU3YTkxNDk1M2FiZTAxZTY5NmY3ZjA1NDgxODE3ZWE4MWQxOWUwMmIwYmUxZA==',
'Host': 'image.baidu.com',
'Referer': 'https://image.baidu.com/',
'sec-ch-ua': '" Not;A Brand";v="99", "Google Chrome";v="91", "Chromium";v="91"',
'sec-ch-ua-mobile': '?0',
'Sec-Fetch-Dest': 'document',
'Sec-Fetch-Mode': 'navigate',
'Sec-Fetch-Site': 'same-origin',
'Sec-Fetch-User': '?1',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.106 Safari/537.36'
}

req = request.Request(url,headers=headers)
res = request.urlopen(req)
print(res.read().decode('utf-8'))

#结果:UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

以下提供了两种解决方法:

方法一:做gzip的解压
导入模块:

import gzip
from io import BytesIO

做gzip的解压

req = request.Request(url,headers=headers)
res = request.urlopen(req)
#在示例里导入模块,以及添加下面的这几行代码就OK了
buff = BytesIO(res.read())
f = gzip.GzipFile(fileobj=buff)
data= f.read().decode('utf-8')
print(data)

方法二

直接去掉在请求的头里的:"Accept-Encoding":"gzip, deflate, br"就OK了

以上是关于UnicodeDecodeError: ‘utf-8‘ codec can‘t decode byte 0x8b in position 1: invalid start byte的主要内容,如果未能解决你的问题,请参考以下文章

来自文件夹嵌套 Zip 文件的 pandas read_csv 的 UnicodeDecodeError ('utf-8')

错误UnicodeDecodeError:'utf-8'编解码器无法解码位置0的字节0xff:无效的起始字节

"for line in..." 导致 UnicodeDecodeError: 'utf-8' codec can't decode byte

UnicodeDecodeError:“utf8”编解码器无法解码字节 0x9c

python:UnicodeDecodeError:'utf8'编解码器无法解码位置0的字节0xc0:无效的起始字节

UnicodeDecodeError: 'utf-8' codec can't decode byte