蟒蛇。通过url读取文件时的文件编码

Question

我需要通过URL获取文件并返回该文件中具有最多单词计数的字符串。这是我的代码：

from urllib.request import urlopen

def wordiest_line(url):
    data = urlopen(url)

    if data:
        max_words = 0
        max_line = ""
        for line in data.readlines(): 
            #print(line)
            the_encoding = "utf-8"
            line = line.decode(the_encoding)
            line = line.rstrip()
            line_words = line.split()
            if len(line_words) > max_words:
                max_words = len(line_words)
                max_line = line

        #print("%s to RETURN
" % max_line)
        return max_line

    else:
        return None

这些是用于测试此功能的一些URL：

“Qazxswpoi”
“Qazxswpoi”
“Qazxswpoi”

对于链接1和3，它工作正常。但由于文件编码，http://math-info.hse.ru/f/2017-18/dj-prog/lines1.txt无法正常工作，因此西里尔文中有一些文字。

我试图定义什么字符串编码并解码它。这是代码：

http://lib.ru/FOUNDATION/3laws.txt_Ascii.txt

现在http://math-info.hse.ru/f/2017-18/dj-prog/lines2.txt失败并出现错误：'charmap'编解码器无法解码位置8中的字节0xdc：字符映射到undefined

其他网址仍然可以。你有什么建议如何修复它？

Answer 1

另一答案