GBK错误

Posted stephen2016

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了GBK错误相关的知识,希望对你有一定的参考价值。

用CMD测试代码的时候,因为CMD默认用gbk支持print(),当UTF字符集出现超出GBK编码的字符是就会出现:

UnicodeEncodeError: ‘gbk’ codec can’t encode character u’\u200e’ in position 43: illegal multibyte sequence

可以在decode时,增加参数ignore对错误进行忽略。

 

bytes.decode(encoding="utf-8", errors="strict")bytearray.decode(encoding="utf-8", errors="strict")

Return a string decoded from the given bytes. Default encoding is ‘utf-8‘. errors may be given to set a different error handling scheme. The default for errors is ‘strict‘, meaning that encoding errors raise a UnicodeError. Other possible values are ‘ignore‘, ‘replace‘ and any other name registered via codecs.register_error(), see section Error Handlers. For a list of possible encodings, see section Standard Encodings.

 

import urllib.request
from bs4 import BeautifulSoup

def trade_spider(max_pages):

    headers = {User-Agent:Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.11 (Khtml, like Gecko) Chrome/23.0.1271.64 Safari/537.11,
    Accept:text/html;q=0.9,*/*;q=0.8
    }

    opener = urllib.request.build_opener()
    opener.addheaders = [headers]
    page=1
    while page <= max_pages:
        url=rhttp://news.zjicm.edu.cn/web_2/pages/type.php?Page_Id=8C2D81A53403FED2AEE8F706017F8C3E&PageNo= + str(page)
        source_code=data = opener.open(url).read()
        soup=BeautifulSoup(source_code,"html.parser")
        for link in soup.find(class_=list-nopic).find_all(a):
            str1=link.string.encode(gbk,errors=ignore)
            print(str1.decode(gbk,errors=ignore))
        page += 1

trade_spider(10)

 

以上是关于GBK错误的主要内容,如果未能解决你的问题,请参考以下文章

解决 java “错误:编码GBK 的不可映射字符”

cmd显示编码gbk不可映射字符

错误:编码GBK的不可映射字符

为啥这个代码片段说包含错误?

java “错误:编码GBK 的不可映射字符”

java “错误:编码GBK 的不可映射字符”