即使代码具有给定值,ValueError 也会引发

Posted

技术标签:

【中文标题】即使代码具有给定值,ValueError 也会引发【英文标题】:ValueError raises even when code has given values 【发布时间】:2021-10-01 06:01:25 【问题描述】:

我正在尝试编写一段代码,帮助我删除所有连词代词、标点符号等。

macbeth = open("macbeth.txt", "r")

contents = macbeth.read()

contents = contents.split()  

def remove_uninteresting_stuff(file_contents):
    # Here is a list of punctuations and uninteresting words you can use to process your text
    punctuations = '''!()-[];:'"\,<>./?@#$%^&*_~'''
    uninteresting_words = ["the", "a", "to", "if", "is", "it", "of", "and", "or", "an", "as", "i", "me", "my", \
    "we", "our", "ours", "you", "your", "yours", "he", "she", "him", "his", "her", "hers", "its", "they", "them", \
    "their", "what", "which", "who", "whom", "this", "that", "am", "are", "was", "were", "be", "been", "being", \
    "have", "has", "had", "do", "does", "did", "but", "at", "by", "with", "from", "here", "when", "where", "how", \
    "all", "any", "both", "each", "few", "more", "some", "such", "no", "nor", "too", "very", "can", "will", "just"]

    for x in punctuations:
        file_contents.remove(x)
        
        
    for x in uninteresting_words:
        file_contents.remove(x)

    return file_contents

print(remove_uninteresting_stuff(contents))

此代码引发此错误:

Traceback (most recent call last):
  File "testingFile.py", line 33, in <module>
    print(remove_uninteresting_stuff(contents))
  File "testingFile.py", line 25, in remove_uninteresting_stuff
    file_contents.remove(x)
ValueError: list.remove(x): x not in list

现在很明显,在麦克白(莎士比亚的)这样的小说中,这些词会存在。

谁能解释这个错误并帮我解决这个问题?

【问题讨论】:

哪个词失败了? print(x) 在失败行之前。 或者您可能不在乎文本不包含特定的单词,只需将 file_content.removetry/except 块包围。 【参考方案1】:

您假设您的单词和标点列表都存在于 Macbeth 中,但事实并非如此。

另一种可能可行的编写方式是:

macbeth = open("macbeth.txt", "r")

contents = macbeth.read()

contents = contents.split()  

def remove_uninteresting_stuff(file_contents):
    # Here is a list of punctuations and uninteresting words you can use to process your text
    punctuations = '''!()-[];:'"\,<>./?@#$%^&*_~'''
    uninteresting_words = ["the", "a", "to", "if", "is", "it", "of", "and", "or", "an", "as", "i", "me", "my", \
    "we", "our", "ours", "you", "your", "yours", "he", "she", "him", "his", "her", "hers", "its", "they", "them", \
    "their", "what", "which", "who", "whom", "this", "that", "am", "are", "was", "were", "be", "been", "being", \
    "have", "has", "had", "do", "does", "did", "but", "at", "by", "with", "from", "here", "when", "where", "how", \
    "all", "any", "both", "each", "few", "more", "some", "such", "no", "nor", "too", "very", "can", "will", "just"]

    file_contents = [word for word in file_contents if word not in uninteresting_words and word not in punctuations]

    return file_contents

print(remove_uninteresting_stuff(contents))

不同之处在于,您在这里检查的单词是否不存在于您的不需要的单词列表中,而不是从您的内容中删除不需要的单词,无论它是否存在。

由于您无法确定内容中是否存在不需要的字词,因此您必须先检查它是否存在,然后将其删除,这与仅保留不需要的字词列表中不存在的字词相同(正如我在代码 sn-p 中所做的那样。

更新

如果您要删除的标点符号是单词的一部分,上面的代码 sn-p 将不起作用(惊喜!)

另一方面,这确实有效:

contents = "The the a to To IF is OF and and or here when where where how all ANY any both few whom who wHo!! -;."

contents = contents.split()  

def remove_uninteresting_stuff(file_contents):
    # Here is a list of punctuations and uninteresting words you can use to process your text
    punctuations = '''!()-[];:'"\,<>./?@#$%^&*_~'''
    uninteresting_words = ["the", "a", "to", "if", "is", "it", "of", "and", "or", "an", "as", "i", "me", "my", \
    "we", "our", "ours", "you", "your", "yours", "he", "she", "him", "his", "her", "hers", "its", "they", "them", \
    "their", "what", "which", "who", "whom", "this", "that", "am", "are", "was", "were", "be", "been", "being", \
    "have", "has", "had", "do", "does", "did", "but", "at", "by", "with", "from", "here", "when", "where", "how", \
    "all", "any", "both", "each", "few", "more", "some", "such", "no", "nor", "too", "very", "can", "will", "just"]

    file_contents = [word.translate(str.maketrans('', '', punctuations)) for word in file_contents]
    file_contents = [word for word in file_contents if word.lower() not in uninteresting_words]

    return file_contents

print(remove_uninteresting_stuff(contents))

【讨论】:

不工作怎么办?它没有删除所有不需要的单词?如果是这种情况,您可能希望首先规范化您的内容,在这种情况下应该在检查之前将所有单词都小写。 也没有用,我认为问题在于它只删除了每种不需要的词中的一种,而不是所有不需要的词。 检查答案更新,输出是一个空列表,表示它删除了所有不需要的单词。 也不行,尝试从projectgutenburg.org等网站下载Macbeth并使用你的代码运行它,绝对行不通。 我做到了,而且确实奏效了。您可以尝试使用运行我的 sn-p 时没有被过滤掉的 Macbeth 的 sn-p 来更新您的问题吗?

以上是关于即使代码具有给定值,ValueError 也会引发的主要内容,如果未能解决你的问题,请参考以下文章

即使内容和代码最少,扩展 viewcell 的自定义控件也会引发 System.InvalidCastException

Python中的零对数

即使“Content-Length”标头具有值,多部分表单的 NSURLSessionDataTask 也会返回空数据

即使引发 SQLException,flywaydb 也会应用 java 迁移

即使设置了 serialVersionUID,反序列化也会引发 InvalidClassException

即使@Autowire和@ repository @ service已正确配置,也会引发NoSuchBeanDefinitionException