即使代码具有给定值，ValueError 也会引发

Posted 2023-03-12

技术标签:

【中文标题】即使代码具有给定值，ValueError 也会引发【英文标题】：ValueError raises even when code has given values 【发布时间】：2021-10-01 06:01:25 【问题描述】：

我正在尝试编写一段代码，帮助我删除所有连词代词、标点符号等。

macbeth = open("macbeth.txt", "r")

contents = macbeth.read()

contents = contents.split()  

def remove_uninteresting_stuff(file_contents):
    # Here is a list of punctuations and uninteresting words you can use to process your text
    punctuations = '''!()-[];:'"\,<>./?@#$%^&*_~'''
    uninteresting_words = ["the", "a", "to", "if", "is", "it", "of", "and", "or", "an", "as", "i", "me", "my", \
    "we", "our", "ours", "you", "your", "yours", "he", "she", "him", "his", "her", "hers", "its", "they", "them", \
    "their", "what", "which", "who", "whom", "this", "that", "am", "are", "was", "were", "be", "been", "being", \
    "have", "has", "had", "do", "does", "did", "but", "at", "by", "with", "from", "here", "when", "where", "how", \
    "all", "any", "both", "each", "few", "more", "some", "such", "no", "nor", "too", "very", "can", "will", "just"]

    for x in punctuations:
        file_contents.remove(x)
        
        
    for x in uninteresting_words:
        file_contents.remove(x)

    return file_contents

print(remove_uninteresting_stuff(contents))

此代码引发此错误：

Traceback (most recent call last):
  File "testingFile.py", line 33, in <module>
    print(remove_uninteresting_stuff(contents))
  File "testingFile.py", line 25, in remove_uninteresting_stuff
    file_contents.remove(x)
ValueError: list.remove(x): x not in list

现在很明显，在麦克白（莎士比亚的）这样的小说中，这些词会存在。

谁能解释这个错误并帮我解决这个问题？

【问题讨论】：

哪个词失败了？ print(x) 在失败行之前。或者您可能不在乎文本不包含特定的单词，只需将 file_content.remove 用 try/except 块包围。 【参考方案1】：

您假设您的单词和标点列表都存在于 Macbeth 中，但事实并非如此。

另一种可能可行的编写方式是：

macbeth = open("macbeth.txt", "r")

contents = macbeth.read()

contents = contents.split()  

def remove_uninteresting_stuff(file_contents):
    # Here is a list of punctuations and uninteresting words you can use to process your text
    punctuations = '''!()-[];:'"\,<>./?@#$%^&*_~'''
    uninteresting_words = ["the", "a", "to", "if", "is", "it", "of", "and", "or", "an", "as", "i", "me", "my", \
    "we", "our", "ours", "you", "your", "yours", "he", "she", "him", "his", "her", "hers", "its", "they", "them", \
    "their", "what", "which", "who", "whom", "this", "that", "am", "are", "was", "were", "be", "been", "being", \
    "have", "has", "had", "do", "does", "did", "but", "at", "by", "with", "from", "here", "when", "where", "how", \
    "all", "any", "both", "each", "few", "more", "some", "such", "no", "nor", "too", "very", "can", "will", "just"]

    file_contents = [word for word in file_contents if word not in uninteresting_words and word not in punctuations]

    return file_contents

print(remove_uninteresting_stuff(contents))

不同之处在于，您在这里检查的单词是否不存在于您的不需要的单词列表中，而不是从您的内容中删除不需要的单词，无论它是否存在。

由于您无法确定内容中是否存在不需要的字词，因此您必须先检查它是否存在，然后将其删除，这与仅保留不需要的字词列表中不存在的字词相同（正如我在代码 sn-p 中所做的那样。

更新

如果您要删除的标点符号是单词的一部分，上面的代码 sn-p 将不起作用（惊喜！）

另一方面，这确实有效：

contents = "The the a to To IF is OF and and or here when where where how all ANY any both few whom who wHo!! -;."

contents = contents.split()  

def remove_uninteresting_stuff(file_contents):
    # Here is a list of punctuations and uninteresting words you can use to process your text
    punctuations = '''!()-[];:'"\,<>./?@#$%^&*_~'''
    uninteresting_words = ["the", "a", "to", "if", "is", "it", "of", "and", "or", "an", "as", "i", "me", "my", \
    "we", "our", "ours", "you", "your", "yours", "he", "she", "him", "his", "her", "hers", "its", "they", "them", \
    "their", "what", "which", "who", "whom", "this", "that", "am", "are", "was", "were", "be", "been", "being", \
    "have", "has", "had", "do", "does", "did", "but", "at", "by", "with", "from", "here", "when", "where", "how", \
    "all", "any", "both", "each", "few", "more", "some", "such", "no", "nor", "too", "very", "can", "will", "just"]

    file_contents = [word.translate(str.maketrans('', '', punctuations)) for word in file_contents]
    file_contents = [word for word in file_contents if word.lower() not in uninteresting_words]

    return file_contents

print(remove_uninteresting_stuff(contents))

【讨论】：

不工作怎么办？它没有删除所有不需要的单词？如果是这种情况，您可能希望首先规范化您的内容，在这种情况下应该在检查之前将所有单词都小写。也没有用，我认为问题在于它只删除了每种不需要的词中的一种，而不是所有不需要的词。检查答案更新，输出是一个空列表，表示它删除了所有不需要的单词。也不行，尝试从projectgutenburg.org等网站下载Macbeth并使用你的代码运行它，绝对行不通。我做到了，而且确实奏效了。您可以尝试使用运行我的 sn-p 时没有被过滤掉的 Macbeth 的 sn-p 来更新您的问题吗？

以上是关于即使代码具有给定值，ValueError 也会引发的主要内容，如果未能解决你的问题，请参考以下文章