无法编码/解码 pprint 输出

Posted 2023-02-24

技术标签:

【中文标题】无法编码/解码 pprint 输出【英文标题】：Unable to encode/decode pprint output 【发布时间】：2012-06-08 15:35:43 【问题描述】：

此问题基于that one 的副作用。

我的.py 文件在第一行都有# -*- coding: utf-8 -*- 编码定义器，比如我的api.py

正如我在相关问题中提到的，我使用 HttpResponse 返回 api 文档。由于我通过以下方式定义编码：

HttpResponse(cy_content, content_type='text/plain; charset=utf-8')

一切正常，当我调用我的 API 服务时，除了 pprint 从字典中形成的字符串之外没有任何编码问题

由于我在我的 dict 中的某些值中使用土耳其字符，pprint 将它们转换为 unichr 等价物，例如：

API_STATUS = 
    1: 'müşteri',
    2: 'some other status message'


my_str = 'Here is the documentation part that contains Turkish chars like işüğçö'
my_str += pprint.pformat(API_STATUS, indent=4, width=1)
return HttpRespopnse(my_str, content_type='text/plain; charset=utf-8')

而我的纯文本输出是这样的：

Here is the documentation part that contains Turkish chars like işüğçö


    1: 'm\xc3\xbc\xc5\x9fteri',
    2: 'some other status message'

我尝试将 pprint 输出解码或编码为不同的编码，但没有成功......克服这个问题的最佳实践是什么

【问题讨论】：

【参考方案1】：

pprint 似乎默认使用repr，您可以通过覆盖PrettyPrinter.format 来解决此问题：

# coding=utf8

import pprint

class MyPrettyPrinter(pprint.PrettyPrinter):
    def format(self, object, context, maxlevels, level):
        if isinstance(object, unicode):
            return (object.encode('utf8'), True, False)
        return pprint.PrettyPrinter.format(self, object, context, maxlevels, level)


d = 'foo': u'işüğçö'

pprint.pprint(d)              # 'foo': u'i\u015f\xfc\u011f\xe7\xf6'
MyPrettyPrinter().pprint(d)   # 'foo': işüğçö

【讨论】：

如果您像我一样尝试将其与pformat（而不是 pprint）一起使用并将生成的字符串发送到模板引擎，例如jinja2，它会给您一个@ 987654328@，您可以通过调用（根据此答案）unicode(MyPrettyPrinter().pformat(d), 'utf-8') 来解决。你能用 PyPI 的格式设置选项包装你的 pprint，这会很有帮助。【参考方案2】：

您应该使用 unicode 字符串而不是 8 位字符串：

API_STATUS = 
    1: u'müşteri',
    2: u'some other status message'


my_str = u'Here is the documentation part that contains Turkish chars like işüğçö'
my_str += pprint.pformat(API_STATUS, indent=4, width=1)

pprint 模块旨在以可读的方式打印出所有可能的嵌套结构。为此，它将打印对象表示，而不是将其转换为字符串，因此无论您是否使用 unicode 字符串，您都会得到转义语法。但是，如果您在文档中使用 unicode，那么您真的应该使用 unicode 文字！

无论如何，thg435 has given you a solution 如何改变 pformat 的这种行为。

【讨论】：

普通（非 unicode）字符串是否称为二进制字符串？我以为它们是 ascii 字符串我也试过了，我也试过django的smart_str，smart_unicode和其他方法......当我使用像u'müşteri这样的unicode字符串时，我得到的是u'm\xfc\u015fteri'跨度> @FallenAngel - 这是由 pformat 生成的 unicode 字符串的表示，我发现您的问题与我想的有点不同...我会再检查一次... @jdi：在python2中，str类型是一个字节序列。 @jdi - 在 python

以上是关于无法编码/解码 pprint 输出的主要内容，如果未能解决你的问题，请参考以下文章

编码给出“'ascii'编解码器无法编码字符......序数不在范围内（128）”

“UnicodeEncodeError：‘ascii’编解码器无法编码字符”

Python请求：UnicodeEncodeError：'charmap'编解码器无法编码字符

base64解码文件不等于原始未编码文件

Python/Flask：UnicodeDecodeError/UnicodeEncodeError：“ascii”编解码器无法解码/编码

Python3'ascii'编解码器无法编码字符