NLP/TF-IDF: ValueError: 具有多个元素的数组的真值不明确。使用 a.any() 或 a.all()

Posted 2023-03-12

技术标签:

【中文标题】NLP/TF-IDF: ValueError: 具有多个元素的数组的真值不明确。使用 a.any() 或 a.all()【英文标题】：NLP/ TF-IDF: ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() 【发布时间】：2021-08-24 11:01:22 【问题描述】：

我正在使用 TF-IDF 创建基于内容的推荐系统，在尝试实现一个函数以输出我已构建的 TF-IDF 模型的实际推荐时出现此错误。关于格式的道歉，我是新手：


# Build a 1-dimensional array with book titles
titles = new_df['Movie Title']
indices = pd.Series(new_df.index, index=new_df['Movie Title'])

# Function that takes in movie title as input and outputs most similar movies
def get_recommendations(title):
    
    # Get the index of the movie that matches the title
    idx = indices[title]

    # Get the pairwsie similarity scores of all movies with that movie
    sim_scores = list(enumerate(cosine_sim[idx]))

    # Sort the movies based on the similarity scores
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)

    # Get the scores of the 10 most similar movies
    sim_scores = sim_scores[1:11]

    # Get the movie indices
    movie_indices = [i[0] for i in sim_scores]

    # Return the top 10 most similar movies
    return new_df['Movie Title'].iloc[movie_indices]

Next code block:

 get_recommendations('The Hangover')

 get_recommendations.head(10)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-23-4e788498ba09> in <module>
----> 1 get_recommendations('The Hangover')
      2 get_recommendations.head(10)

<ipython-input-22-98000fdd3df8> in get_recommendations(title)
     13 
     14     # Sort the movies based on the similarity scores
---> 15     sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
     16 
     17     # Get the scores of the 10 most similar movies

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

【问题讨论】：

您可能需要检查x[1] 的值。这篇博文有很好的解释：akashmittal.com/valueerror-truth-value-array-ambiguous 对您尝试排序的值进行< 类型比较时遇到问题。 sim_scores 是什么，或者你更喜欢 indices 系列的派生？ 【参考方案1】：

当有相同的电影标题名称具有不同的索引时，可能会出现此错误，即假设电影标题是“教父”，所以它可能有：

movie  title          index
The    Godfather      0
The    Godfather      1
The    Godfather      2

所以要克服这个问题，从数据帧中删除重复值，然后运行它必须运行的程序。

new_df.drop_duplicates(subset=['Movie Title'])

【讨论】：

以上是关于NLP/TF-IDF: ValueError: 具有多个元素的数组的真值不明确。使用 a.any() 或 a.all()的主要内容，如果未能解决你的问题，请参考以下文章

ValueError: '对象对于所需数组来说太深'

ValueError：不支持多类格式

如何解决 raise ValueError("columns must have matching element counts") ValueError: columns mus

“ValueError：标签 ['timestamp'] 不包含在轴中”错误

ValueError：不支持连续[重复]

django：ValueError - 无法序列化