即使代码具有给定值,ValueError 也会引发
Posted
技术标签:
【中文标题】即使代码具有给定值,ValueError 也会引发【英文标题】:ValueError raises even when code has given values 【发布时间】:2021-10-01 06:01:25 【问题描述】:我正在尝试编写一段代码,帮助我删除所有连词代词、标点符号等。
macbeth = open("macbeth.txt", "r")
contents = macbeth.read()
contents = contents.split()
def remove_uninteresting_stuff(file_contents):
# Here is a list of punctuations and uninteresting words you can use to process your text
punctuations = '''!()-[];:'"\,<>./?@#$%^&*_~'''
uninteresting_words = ["the", "a", "to", "if", "is", "it", "of", "and", "or", "an", "as", "i", "me", "my", \
"we", "our", "ours", "you", "your", "yours", "he", "she", "him", "his", "her", "hers", "its", "they", "them", \
"their", "what", "which", "who", "whom", "this", "that", "am", "are", "was", "were", "be", "been", "being", \
"have", "has", "had", "do", "does", "did", "but", "at", "by", "with", "from", "here", "when", "where", "how", \
"all", "any", "both", "each", "few", "more", "some", "such", "no", "nor", "too", "very", "can", "will", "just"]
for x in punctuations:
file_contents.remove(x)
for x in uninteresting_words:
file_contents.remove(x)
return file_contents
print(remove_uninteresting_stuff(contents))
此代码引发此错误:
Traceback (most recent call last):
File "testingFile.py", line 33, in <module>
print(remove_uninteresting_stuff(contents))
File "testingFile.py", line 25, in remove_uninteresting_stuff
file_contents.remove(x)
ValueError: list.remove(x): x not in list
现在很明显,在麦克白(莎士比亚的)这样的小说中,这些词会存在。
谁能解释这个错误并帮我解决这个问题?
【问题讨论】:
哪个词失败了?print(x)
在失败行之前。
或者您可能不在乎文本不包含特定的单词,只需将 file_content.remove
用 try/except
块包围。
【参考方案1】:
您假设您的单词和标点列表都存在于 Macbeth 中,但事实并非如此。
另一种可能可行的编写方式是:
macbeth = open("macbeth.txt", "r")
contents = macbeth.read()
contents = contents.split()
def remove_uninteresting_stuff(file_contents):
# Here is a list of punctuations and uninteresting words you can use to process your text
punctuations = '''!()-[];:'"\,<>./?@#$%^&*_~'''
uninteresting_words = ["the", "a", "to", "if", "is", "it", "of", "and", "or", "an", "as", "i", "me", "my", \
"we", "our", "ours", "you", "your", "yours", "he", "she", "him", "his", "her", "hers", "its", "they", "them", \
"their", "what", "which", "who", "whom", "this", "that", "am", "are", "was", "were", "be", "been", "being", \
"have", "has", "had", "do", "does", "did", "but", "at", "by", "with", "from", "here", "when", "where", "how", \
"all", "any", "both", "each", "few", "more", "some", "such", "no", "nor", "too", "very", "can", "will", "just"]
file_contents = [word for word in file_contents if word not in uninteresting_words and word not in punctuations]
return file_contents
print(remove_uninteresting_stuff(contents))
不同之处在于,您在这里检查的单词是否不存在于您的不需要的单词列表中,而不是从您的内容中删除不需要的单词,无论它是否存在。
由于您无法确定内容中是否存在不需要的字词,因此您必须先检查它是否存在,然后将其删除,这与仅保留不需要的字词列表中不存在的字词相同(正如我在代码 sn-p 中所做的那样。
更新
如果您要删除的标点符号是单词的一部分,上面的代码 sn-p 将不起作用(惊喜!)
另一方面,这确实有效:
contents = "The the a to To IF is OF and and or here when where where how all ANY any both few whom who wHo!! -;."
contents = contents.split()
def remove_uninteresting_stuff(file_contents):
# Here is a list of punctuations and uninteresting words you can use to process your text
punctuations = '''!()-[];:'"\,<>./?@#$%^&*_~'''
uninteresting_words = ["the", "a", "to", "if", "is", "it", "of", "and", "or", "an", "as", "i", "me", "my", \
"we", "our", "ours", "you", "your", "yours", "he", "she", "him", "his", "her", "hers", "its", "they", "them", \
"their", "what", "which", "who", "whom", "this", "that", "am", "are", "was", "were", "be", "been", "being", \
"have", "has", "had", "do", "does", "did", "but", "at", "by", "with", "from", "here", "when", "where", "how", \
"all", "any", "both", "each", "few", "more", "some", "such", "no", "nor", "too", "very", "can", "will", "just"]
file_contents = [word.translate(str.maketrans('', '', punctuations)) for word in file_contents]
file_contents = [word for word in file_contents if word.lower() not in uninteresting_words]
return file_contents
print(remove_uninteresting_stuff(contents))
【讨论】:
不工作怎么办?它没有删除所有不需要的单词?如果是这种情况,您可能希望首先规范化您的内容,在这种情况下应该在检查之前将所有单词都小写。 也没有用,我认为问题在于它只删除了每种不需要的词中的一种,而不是所有不需要的词。 检查答案更新,输出是一个空列表,表示它删除了所有不需要的单词。 也不行,尝试从projectgutenburg.org等网站下载Macbeth并使用你的代码运行它,绝对行不通。 我做到了,而且确实奏效了。您可以尝试使用运行我的 sn-p 时没有被过滤掉的 Macbeth 的 sn-p 来更新您的问题吗?以上是关于即使代码具有给定值,ValueError 也会引发的主要内容,如果未能解决你的问题,请参考以下文章
即使内容和代码最少,扩展 viewcell 的自定义控件也会引发 System.InvalidCastException
即使“Content-Length”标头具有值,多部分表单的 NSURLSessionDataTask 也会返回空数据
即使引发 SQLException,flywaydb 也会应用 java 迁移
即使设置了 serialVersionUID,反序列化也会引发 InvalidClassException
即使@Autowire和@ repository @ service已正确配置,也会引发NoSuchBeanDefinitionException