如何删除 Python 三重引号多行字符串的额外缩进？

Posted 2023-02-21

技术标签:

【中文标题】如何删除 Python 三重引号多行字符串的额外缩进？【英文标题】：How to remove extra indentation of Python triple quoted multi-line strings? 【发布时间】：2010-11-27 14:18:35 【问题描述】：

我有一个 python 编辑器，用户在其中输入脚本或代码，然后将其放入幕后的 main 方法中，同时每行都缩进。问题是，如果用户有一个多行字符串，则对整个脚本所做的缩进会通过在每个空格中插入一个制表符来影响字符串。一个问题脚本会很简单：

"""foo
bar
foo2"""

所以在 main 方法中它看起来像：

def main():
    """foo
    bar
    foo2"""

并且字符串现在将在每行的开头有一个额外的制表符。

【问题讨论】：

codereview.stackexchange.com/questions/60366/… 【参考方案1】：

标准库中的textwrap.dedent 可以自动撤消古怪的缩进。

【讨论】：

标准库总是有惊喜。请注意，如果第一行以"""foo 开头，则第一行缺少其他行所具有的前导缩进，因此dedent 不会做任何事情。如果您等待在下一行开始 foo 并像这样转义第一个换行符，它将起作用：`"""\` 要解决@ScottH 提到的缺点，请参阅我关于inspect.cleandoc的回答【参考方案2】：

据我所知，这里更好的答案可能是inspect.cleandoc，它完成了textwrap.dedent 所做的大部分工作，但也解决了textwrap.dedent 在引导线方面的问题。

以下示例显示了差异：

>>> import textwrap
>>> import inspect
>>> x = """foo bar
    baz
    foobar
    foobaz
    """
>>> inspect.cleandoc(x)
'foo bar\nbaz\nfoobar\nfoobaz'
>>> textwrap.dedent(x)
'foo bar\n    baz\n    foobar\n    foobaz\n'
>>> y = """
...     foo
...     bar
... """
>>> inspect.cleandoc(y)
'foo\nbar'
>>> textwrap.dedent(y)
'\nfoo\nbar\n'
>>> z = """\tfoo
bar\tbaz
"""
>>> inspect.cleandoc(z)
'foo\nbar     baz'
>>> textwrap.dedent(z)
'\tfoo\nbar\tbaz\n'

请注意，inspect.cleandoc 还将内部制表符扩展为空格。这可能不适合一个人的用例，但对我来说很好。

【讨论】：

请注意，这两者在其他方面并不完全相同，cleandoc 所做的处理不仅仅是删除缩进。至少，将'\t' 扩展为' ' 这是真的，但我当时没有注意到。我将更新答案以至少反映标签扩展。也可以textwrap.dedent(s).strip() 避免更改制表符并仍然处理前导和尾随换行符。我写这个答案的上下文比提出问题的上下文更笼统。我正在寻找重新排列文档字符串以用于文档目的（因此折叠很有帮助）。没错，您可以对textwrap.dedent 输出进行后处理以用于更具体的场景。当我回答这个问题时，我忽略了原始问题的细微差别。不过，我确实相信我的回答在一般意义上更有帮助。 IDK 如果它对 python 世界来说是一个愚蠢的错误，但是在三重引号字符串中使用 \n somwhere 时应该小心。 inspect.cleandoc 不会清理那个。（经验丰富。）。【参考方案3】：

多行字符串第一行之后的内容是字符串的一部分，解析器不将其视为缩进。你可以随意写：

def main():
    """foo
bar
foo2"""
    pass

它会做正确的事。

另一方面，这是不可读的，Python 知道这一点。因此，如果一个文档字符串在它的 second 行中包含空格，那么当您使用 help() 查看文档字符串时，该数量的空格将被删除。因此，help(main) 和下面的help(main2) 产生相同的帮助信息。

def main2():
    """foo
    bar
    foo2"""
    pass

【讨论】：

感谢您的回复。不幸的是，缩进是完全自动化的，因为我的代码在脚本中读取为字符串（在 Java 中）并缩进该字符串中的每一行。我不认为只有文档字符串使用三引号。这种自动化不适用于其他地方 @tribbloid 文档字符串的特殊逻辑特定于使help() 默认做一些好事的用例。要在其他地方使用相同的凹痕逻辑，您可以使用textwrap.dedent()，如该问题的其他所有答案中所述。【参考方案4】：

我看到的唯一方法是从第二个开始为每行删除前 n 个选项卡，其中 n 是主要方法的已知标识。

如果事先不知道该标识 - 您可以在插入之前添加尾随换行符，并从最后一行删除标签数...

第三种解决方案是解析数据并找到多行引号的开头，并且在它被关闭之前不要将您的标识添加到每一行。

认为有更好的解决方案..

【讨论】：

感谢您的回复。所以你建议我去掉每一行已经插入的缩进？我很困惑...【参考方案5】：

更清楚地显示textwrap.dedent 和inspect.cleandoc 之间的区别：

前导部分不缩进的行为

import textwrap
import inspect

string1="""String
with
no indentation
       """
string2="""String
        with
        indentation
       """
print('string1 plain=' + repr(string1))
print('string1 inspect.cleandoc=' + repr(inspect.cleandoc(string1)))
print('string1 texwrap.dedent=' + repr(textwrap.dedent(string1)))
print('string2 plain=' + repr(string2))
print('string2 inspect.cleandoc=' + repr(inspect.cleandoc(string2)))
print('string2 texwrap.dedent=' + repr(textwrap.dedent(string2)))

输出

string1 plain='String\nwith\nno indentation\n       '
string1 inspect.cleandoc='String\nwith\nno indentation\n       '
string1 texwrap.dedent='String\nwith\nno indentation\n'
string2 plain='String\n        with\n        indentation\n       '
string2 inspect.cleandoc='String\nwith\nindentation'
string2 texwrap.dedent='String\n        with\n        indentation\n'

前导部分缩进的行为

string1="""
String
with
no indentation
       """
string2="""
        String
        with
        indentation
       """

print('string1 plain=' + repr(string1))
print('string1 inspect.cleandoc=' + repr(inspect.cleandoc(string1)))
print('string1 texwrap.dedent=' + repr(textwrap.dedent(string1)))
print('string2 plain=' + repr(string2))
print('string2 inspect.cleandoc=' + repr(inspect.cleandoc(string2)))
print('string2 texwrap.dedent=' + repr(textwrap.dedent(string2)))

输出

string1 plain='\nString\nwith\nno indentation\n       '
string1 inspect.cleandoc='String\nwith\nno indentation\n       '
string1 texwrap.dedent='\nString\nwith\nno indentation\n'
string2 plain='\n        String\n        with\n        indentation\n       '
string2 inspect.cleandoc='String\nwith\nindentation'
string2 texwrap.dedent='\nString\nwith\nindentation\n'

【讨论】：

【参考方案6】：

我想准确地保留三引号行之间的内容，只删除常见的前导缩进。我发现texwrap.dedent 和inspect.cleandoc 做得不太对，所以我写了这个。它使用os.path.commonprefix。

import re
from os.path import commonprefix

def ql(s, eol=True):
    lines = s.splitlines()
    l0 = None
    if lines:
        l0 = lines.pop(0) or None
    common = commonprefix(lines)
    indent = re.match(r'\s*', common)[0]
    n = len(indent)
    lines2 = [l[n:] for l in lines]
    if not eol and lines2 and not lines2[-1]:
        lines2.pop()
    if l0 is not None:
        lines2.insert(0, l0)
    s2 = "\n".join(lines2)
    return s2

这可以用任何缩进引用任何字符串。我希望它默认包含尾随换行符，但可以选择删除它，以便它可以整齐地引用任何字符串。

例子：

print(ql("""
     Hello
    |\---/|
    | o_o |
     \_^_/
    """))

print(ql("""
         World
        |\---/|
        | o_o |
         \_^_/
    """))

第二个字符串有 4 个公共缩进空格，因为最后的 """ 缩进小于引用文本：

 Hello
|\---/|
| o_o |
 \_^_/

     World
    |\---/|
    | o_o |
     \_^_/

我认为这会更简单，否则我不会打扰它！

【讨论】：

【参考方案7】：

因此，如果我理解正确，您可以获取用户输入的任何内容，正确缩进并将其添加到程序的其余部分（然后运行整个程序）。

因此，在您将用户输入放入程序后，您可以运行一个正则表达式，这基本上可以恢复强制缩进。类似于：在三个引号内，将所有“新行标记”后跟四个空格（或制表符）替换为仅一个“新行标记”。

【讨论】：

是的，没错。这是我想出的唯一可能的解决方案。不知道为什么我不继续这样做……我想如果没有更好的办法，我可能不得不这样做。 @thraxil 建议使用 textwrap.dedent 是可行的方法。考虑更改您接受的答案。 @ChrisCalo @bbenne10 的回答更好

以上是关于如何删除 Python 三重引号多行字符串的额外缩进？的主要内容，如果未能解决你的问题，请参考以下文章