如何从字符串中删除所有空格

Posted 2023-02-24

技术标签:

【中文标题】如何从字符串中删除所有空格【英文标题】：How to strip all whitespace from string 【发布时间】：2011-04-13 23:39:04 【问题描述】：

如何去除 python 字符串中的所有空格？例如，我希望将strip my spaces 之类的字符串转换为stripmyspaces，但我似乎无法使用strip() 来实现：

>>> 'strip my spaces'.strip()
'strip my spaces'

【问题讨论】：

请注意，str.strip 只影响前导和尾随空格。 ..它不处理 Unicode 事实上的空白，如零宽度空间。详情请见***.com/a/3739928/2693875。 【参考方案1】：

Parce

去除

加入

最后一行代码：

' '.join(word.strip() for word in message_text.split()

【讨论】：

【参考方案2】：

删除 Python 中的起始空格

string1 = "    This is Test String to strip leading space"
print(string1)
print(string1.lstrip())

在 Python 中删除尾随或结束空格

string2 = "This is Test String to strip trailing space     "
print(string2)
print(string2.rstrip())

在 Python 中删除字符串开头和结尾的空格

string3 = "    This is Test String to strip leading and trailing space      "
print(string3)
print(string3.strip())

删除python中的所有空格

string4 = "   This is Test String to test all the spaces        "
print(string4)
print(string4.replace(" ", ""))

【讨论】：

【参考方案3】：

对于 Python 3：

>>> import re
>>> re.sub(r'\s+', '', 'strip my \n\t\r ASCII and \u00A0 \u2003 Unicode spaces')
'stripmyASCIIandUnicodespaces'
>>> # Or, depending on the situation:
>>> re.sub(r'(\s|\u180B|\u200B|\u200C|\u200D|\u2060|\uFEFF)+', '', \
... '\uFEFF\t\t\t strip all \u000A kinds of \u200B whitespace \n')
'stripallkindsofwhitespace'

...处理您没有想到的任何空白字符 - 相信我们，有很多。

\s 本身总是覆盖 ASCII 空白：

（常规）空格标签换行 (\n) 回车 (\r) 换页垂直标签

另外：

对于启用了re.UNICODE 的 Python 2，对于没有任何额外操作的 Python 3，

...\s 还涵盖了 Unicode 空白字符，例如：

不间断空格， em 空格，表意空间，

...等等。查看完整列表here, under "Unicode characters with White_Space property"。

但是\s 不涵盖未被归类为空白的字符，这些字符实际上是空白，例如：

零宽度连接器，蒙古语元音分隔符，零宽度不间断空格（又名byte order mark），

...等等。查看完整列表here, under "Related Unicode characters without White_Space property"。

所以这 6 个字符被第二个正则表达式 \u180B|\u200B|\u200C|\u200D|\u2060|\uFEFF 中的列表覆盖。

来源：

https://docs.python.org/2/library/re.html https://docs.python.org/3/library/re.html https://en.wikipedia.org/wiki/Unicode_character_property

【讨论】：

这个解决方案比公认的答案要少得多。这比其他答案更明确，所以对我来说是蛋糕。接受的答案是旧的，从几乎没有人使用 Python 3 的时代开始，因此不包括 Unicode 字符串。它也不必要地进入优化，这是没有要求的。这就是我认为我将这个答案更新为最佳答案的原因。 @GregDubicki 感谢您的补充。我添加了简单的选项，因为在某些情况下完整的列表可能是多余的，或者如果您需要保留 BOM 则有害（但希望您不需要）。还注意到 MVS，因为它是我最喜欢的 Unicode 字符，并且在我最初写这个时仍然是 Zs。 :P 该编辑似乎具有误导性（\s 在许多情况下使用 Unicode 字符串就足够了，而不仅仅是 ASCII），并且通过强调 Python 2（现在是 EOL，问题标记为 Python 3）增加了复杂性.我认为简要提及 Python 2 和对差异的解释足以选择正确的方法。【参考方案4】：

如果不需要最佳性能，而您只是想要一些简单的东西，您可以定义一个基本函数来使用字符串类的内置“isspace”方法来测试每个字符：

def remove_space(input_string):
    no_white_space = ''
    for c in input_string:
        if not c.isspace():
            no_white_space += c
    return no_white_space

以这种方式构建no_white_space字符串不会有理想的性能，但解决方案很容易理解。

>>> remove_space('strip my spaces')
'stripmyspaces'

如果您不想定义函数，可以将其转换为与列表推导式类似的东西。借用***答案的join 解决方案：

>>> "".join([c for c in "strip my spaces" if not c.isspace()])
'stripmyspaces'

【讨论】：

【参考方案5】：

过滤列表的标准技术适用，尽管它们不如split/join 或translate 方法有效。

我们需要一组空格：

>>> import string
>>> ws = set(string.whitespace)

filter 内置：

>>> "".join(filter(lambda c: c not in ws, "strip my spaces"))
'stripmyspaces'

列表理解（是的，使用括号：请参阅下面的基准）：

>>> import string
>>> "".join([c for c in "strip my spaces" if c not in ws])
'stripmyspaces'

折叠：

>>> import functools
>>> "".join(functools.reduce(lambda acc, c: acc if c in ws else acc+c, "strip my spaces"))
'stripmyspaces'

基准测试：

>>> from timeit import timeit
>>> timeit('"".join("strip my spaces".split())')
0.17734256500003198
>>> timeit('"strip my spaces".translate(ws_dict)', 'import string; ws_dict = ord(ws):None for ws in string.whitespace')
0.457635745999994
>>> timeit('re.sub(r"\s+", "", "strip my spaces")', 'import re')
1.017787621000025

>>> SETUP = 'import string, operator, functools, itertools; ws = set(string.whitespace)'
>>> timeit('"".join([c for c in "strip my spaces" if c not in ws])', SETUP)
0.6484303600000203
>>> timeit('"".join(c for c in "strip my spaces" if c not in ws)', SETUP)
0.950212219999969
>>> timeit('"".join(filter(lambda c: c not in ws, "strip my spaces"))', SETUP)
1.3164566040000523
>>> timeit('"".join(functools.reduce(lambda acc, c: acc if c in ws else acc+c, "strip my spaces"))', SETUP)
1.6947649049999995

【讨论】：

【参考方案6】：

TL/DR

此解决方案已使用 Python 3.6 进行了测试

要在 Python3 中去除字符串中的所有空格，您可以使用以下函数：

def remove_spaces(in_string: str):
    return in_string.translate(str.maketrans(' ': '')

要删除任何空白字符（'\t\n\r\x0b\x0c'），您可以使用以下函数：

import string
def remove_whitespace(in_string: str):
    return in_string.translate(str.maketrans(dict.fromkeys(string.whitespace)))

说明

Python 的str.translate 方法是str 的内置类方法，它接受一个表并返回字符串的副本，其中每个字符都通过传递的转换表映射。 Full documentation for str.translate

使用str.maketrans 创建转换表。该方法是str 的另一个内置类方法。这里我们只使用一个参数，在本例中是一个字典，其中键是要替换的字符，映射到具有字符替换值的值。它返回一个用于str.translate 的转换表。 Full documentation for str.maketrans

python 中的string 模块包含一些常见的字符串操作和常量。 string.whitespace 是一个常量，它返回一个包含所有被视为空白的 ASCII 字符的字符串。这包括字符空格、制表符、换行符、回车符、换页符和垂直制表符。Full documentation for string

在第二个函数中，dict.fromkeys 用于创建一个字典，其中的键是string.whitespace 返回的字符串中的字符，每个字符的值为None。 Full documentation for dict.fromkeys

【讨论】：

【参考方案7】：

正如 Roger Pate 所说，以下代码对我有用：

s = " \t foo \n bar "
"".join(s.split())
'foobar'

我正在使用 Jupyter Notebook 运行以下代码：

i=0
ProductList=[]
while i < len(new_list): 
   temp=''                            # new_list[i]=temp=' Plain   Utthapam  '
   #temp=new_list[i].strip()          #if we want o/p as: 'Plain Utthapam'
   temp="".join(new_list[i].split())  #o/p: 'PlainUtthapam' 
   temp=temp.upper()                  #o/p:'PLAINUTTHAPAM' 
   ProductList.append(temp)
   i=i+2

【讨论】：

【参考方案8】：

或者，

"strip my spaces".translate( None, string.whitespace )

这里是 Python3 版本：

"strip my spaces".translate(str.maketrans('', '', string.whitespace))

【讨论】：

这似乎是最 Pythonic 的。为什么它没有被投票到顶部？答案中的 Python 3 代码确实有效。 @DanMenes 的评论已过时 NameError: name 'string' is not defined. @ZelphirKaltstahl 你需要import string string.whitespace 只是 ASCII 空格，因此在包含 U+2028 LINE SEPARATOR 等内容的字符串上会失败。【参考方案9】：

import re
re.sub(' ','','strip my spaces')

【讨论】：

欢迎来到 SO。尽管我们感谢您的回答，但如果它在其他答案之上提供额外的价值会更好。在这种情况下，您的答案不会提供额外的价值，因为另一个用户已经发布了该解决方案。如果之前的答案对您有帮助，那么您应该在获得足够声誉后投票支持这没有回答“如何删除所有空白”的问题。它只删除空格【参考方案10】：

在没有 sep 参数的情况下利用 str.split 的行为：

>>> s = " \t foo \n bar "
>>> "".join(s.split())
'foobar'

如果您只想删除空格而不是所有空格：

>>> s.replace(" ", "")
'\tfoo\nbar'

过早的优化

尽管效率不是主要目标——编写清晰的代码才是——这里有一些初始时间安排：

$ python -m timeit '"".join(" \t foo \n bar ".split())'
1000000 loops, best of 3: 1.38 usec per loop
$ python -m timeit -s 'import re' 're.sub(r"\s+", "", " \t foo \n bar ")'
100000 loops, best of 3: 15.6 usec per loop

请注意，正则表达式已缓存，因此它没有您想象的那么慢。事先编译它会有所帮助，但只有在您调用它很多次时才会在实践中发挥作用：

$ python -m timeit -s 'import re; e = re.compile(r"\s+")' 'e.sub("", " \t foo \n bar ")'
100000 loops, best of 3: 7.76 usec per loop

尽管 re.sub 慢了 11.3 倍，但请记住，您的瓶颈肯定在其他地方。大多数程序不会注意到这 3 个选项之间的区别。

【讨论】：

它可能比\s+ 替换慢。我会坚持重新。 @OTZ：您可能会感到惊讶，但请参阅“记住”说明。 @Roger 嗯。有趣的。您是否尝试过s.translate 方法？它可能胜过此页面上显示的所有方法。 @Roger Pate：你不需要'table'参数来翻译，它可以是None——尽管令人惊讶的是，这让它变慢了...... 试试myString.translate(None, " \t\r\n\v")。它只需要 Roger 最快（拆分和连接）技术的 83%。不确定它是否涵盖了 split 所做的所有空白字符，但对于大多数 ASCII 应用程序来说可能就足够了。【参考方案11】：

最简单的就是用replace：

"foo bar\t".replace(" ", "").replace("\t", "")

或者，使用正则表达式：

import re
re.sub(r"\s", "", "foo bar\t")

【讨论】：

【参考方案12】：

尝试使用 re.sub 的正则表达式。您可以搜索所有空格并替换为空字符串。

\s 在您的模式中将匹配空白字符 - 而不仅仅是空格（制表符、换行符等）。你可以阅读更多关于它的信息in the manual。

【讨论】：

我不知道如何使用正则表达式 :( @wrongusername：更新了 re 模块手册页的链接。

以上是关于如何从字符串中删除所有空格的主要内容，如果未能解决你的问题，请参考以下文章

JavaScript：如何从 JSON 字符串中删除除值中的空格之外的所有空格？

如何从字符串中删除除字母、数字、空格、感叹号和问号之外的所有内容？

从 JSON 字符串中删除所有不必要的空格（在 PHP 中）

删除字符串中的所有空格

从 SQL Server 中的字符串中删除所有空格

删除数字，然后从字符串的开头删除一个空格