Python3中的优雅字符串解析[重复]

Posted 2023-02-23

技术标签:

【中文标题】Python3中的优雅字符串解析[重复]【英文标题】：Elegant String Parsing in Python3 [duplicate] 【发布时间】：2016-10-13 06:17:55 【问题描述】：

我有需要放入列表的字符串；例如我需要那个

C C .0033 .0016 'International Tables Vol C Tables 4.2.6.8 and 6.1.1.4' C

变成

['C', 'C', '.0033', '.0016', 'International Tables Vol C Tables 4.2.6.8 and 6.1.1.4', 'C']

所以引号中的所有内容都变成了一个列表元素；否则，由空格分隔的所有内容都将成为单个列表元素。

我的第一个想法是简单的拆分，将不包含 ' 的项目放入一个新数组中，然后将引用部分中的项目放回一起：

>>> s.split()
['C', 'C', '.0033', '.0016', "'International", 'Tables', 'Vol', 'C', 'Tables', '4.2.6.8', 'and', "6.1.1.4'", 'C']
>>> arr = []
>>> i = 0
>>> while i < len(s):
        v = ''
        if s[i].startswith("'"):
            while not s[i].endswith("'"):
                v = v.append(s[i]+ " ")
                i += 1
            v.append(s[i])
            arr.append(v)
        else:
            arr.append(s[i])

但是这个策略非常难看，另外我必须假设字符串被分割在一个空格上。

s.partition("'") 看起来很有希望：

>>> s.partition("'")
('C C .0033 .0016 ', "'", "International Tables Vol C Tables 4.2.6.8 and 6.1.1.4' C")

但这很尴尬，因为我必须在迭代时再次进行分区，并且对于哪个在引号中是上下文相关的。

是否有一种简单的 Python3 方法可以如上所述拆分此字符串？

【问题讨论】：

【参考方案1】：

您可以使用shlex 模块。示例：

import shlex

print(shlex.split("C C .0033 .0016 'International Tables Vol C Tables 4.2.6.8 and 6.1.1.4' C"))

【讨论】：

天哪，它是怎么知道的？选词不当。编辑@NickThompson

以上是关于Python3中的优雅字符串解析[重复]的主要内容，如果未能解决你的问题，请参考以下文章