TypeError：使用RegEx过滤嵌套字符串列表时的预期字符串或类字节对象

Posted 2021-05-05

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了TypeError：使用RegEx过滤嵌套字符串列表时的预期字符串或类字节对象相关的知识，希望对你有一定的参考价值。

我有这个嵌套的字符串列表，这是清洁的最后阶段。我想用空格替换嵌套列表中的非字母，或者创建一个没有非字母的新列表。这是我的清单：

list = [['hello', 'mr.', 'smith', ',', 'how', 'are', 'you', 'doing', 'today', '?'], ['the', 'weather', 'is', 'great', ',', 'and', 'python', 'is', 'awesome', '.'], ['the', 'sky', 'is', 'pinkish-blue', '.'], ['you', 'should', "n't", 'eat', 'cardboard', '.']]

这是我想用来清理它的模式

pattern = re.compile(r'W+')
newlist = list(filter(pattern.search, list))
print(newlist)

代码不起作用，这是我得到的错误：

Traceback (most recent call last):
File "/Users/art/Desktop/TxtProcessing/regexp", line 28, in <module>
newlist = [list(filter(pattern.search, list))]
TypeError: expected string or bytes-like object

我知道list不是一个字符串，而是一个字符串列表列表，我该如何解决？任何帮助都会非常感谢！

答案

您需要深入到列表中

import re

list_ = [['hello', 'mr.', 'smith', ',', 'how', 'are', 'you', 'doing', 'today', '?'], ['the', 'weather', 'is', 'great', ',', 'and', 'python', 'is', 'awesome', '.'], ['the', 'sky', 'is', 'pinkish-blue', '.'], ['you', 'should', "n't", 'eat', 'cardboard', '.']]

pattern = re.compile(r'W+')

newlist_ = [item 
            for sublist_ in list_ 
            for item in sublist_ 
            if pattern.search(item)]

print(newlist_)
# ['mr.', ',', '?', ',', '.', 'pinkish-blue', '.', "n't", '.']

此外，您不能将变量命名为list。

另一答案

您正在尝试将列表传递给re.search，但是，只允许字符串，因为应该发生模式匹配。尝试循环遍历列表：

import re
l = [['hello', 'mr.', 'smith', ',', 'how', 'are', 'you', 'doing', 'today', '?'], ['the', 'weather', 'is', 'great', ',', 'and', 'python', 'is', 'awesome', '.'], ['the', 'sky', 'is', 'pinkish-blue', '.'], ['you', 'should', "n't", 'eat', 'cardboard', '.']]
new_l = [[b for b in i if re.findall('^w+$', b)] for i in l]

另外，请注意您的原始变量名称qazxsw poi，阴影内置的qazxsw poi函数，在这种情况下，将列表内容分配给属性list。

另一答案

首先，遮蔽像list这样的内置名称可能会导致各种麻烦 - 请仔细选择变量名称。

你实际上并不需要一个正则表达式 - 有一个内置的list：

如果字符串中的所有字符都是字母并且至少有一个字符，则返回true，否则返回false。

list

以下是如何应用相同的过滤逻辑，但使用isalpha() string method和In [1]: l = [['hello', 'mr.', 'smith', ',', 'how', 'are', 'you', 'doing', 'today', '?'], ['the', 'wea ...: ther', 'is', 'great', ',', 'and', 'python', 'is', 'awesome', '.'], ['the', 'sky', 'is', 'pink ...: ish-blue', '.'], ['you', 'should', "n't", 'eat', 'cardboard', '.']] In [2]: [[item for item in sublist if item.isalpha()] for sublist in l] Out[2]: [['hello', 'smith', 'how', 'are', 'you', 'doing', 'today'], ['the', 'weather', 'is', 'great', 'and', 'python', 'is', 'awesome'], ['the', 'sky', 'is'], ['you', 'should', 'eat', 'cardboard']]（您还需要map的帮助）：

filter

以上是关于TypeError：使用RegEx过滤嵌套字符串列表时的预期字符串或类字节对象的主要内容，如果未能解决你的问题，请参考以下文章

如何从 Pyspark Dataframe 中的字符串列中过滤字母值？

Pyspark：字符串列上的多个过滤器

来自 ARRAY<STRUCT<STRING, STRING>> 的 BigQuery 未嵌套内部字符串列

在字符串列 bigquery 中查询 json

PHP 和 RegEx：用不在括号内的逗号（以及嵌套括号）拆分字符串

Regex：过滤特殊字符（如日语），但保留表情符号