NLP/TF-IDF: ValueError: 具有多个元素的数组的真值不明确。使用 a.any() 或 a.all()
Posted
技术标签:
【中文标题】NLP/TF-IDF: ValueError: 具有多个元素的数组的真值不明确。使用 a.any() 或 a.all()【英文标题】:NLP/ TF-IDF: ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() 【发布时间】:2021-08-24 11:01:22 【问题描述】:我正在使用 TF-IDF 创建基于内容的推荐系统,在尝试实现一个函数以输出我已构建的 TF-IDF 模型的实际推荐时出现此错误。关于格式的道歉,我是新手:
# Build a 1-dimensional array with book titles
titles = new_df['Movie Title']
indices = pd.Series(new_df.index, index=new_df['Movie Title'])
# Function that takes in movie title as input and outputs most similar movies
def get_recommendations(title):
# Get the index of the movie that matches the title
idx = indices[title]
# Get the pairwsie similarity scores of all movies with that movie
sim_scores = list(enumerate(cosine_sim[idx]))
# Sort the movies based on the similarity scores
sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
# Get the scores of the 10 most similar movies
sim_scores = sim_scores[1:11]
# Get the movie indices
movie_indices = [i[0] for i in sim_scores]
# Return the top 10 most similar movies
return new_df['Movie Title'].iloc[movie_indices]
Next code block:
get_recommendations('The Hangover')
get_recommendations.head(10)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-23-4e788498ba09> in <module>
----> 1 get_recommendations('The Hangover')
2 get_recommendations.head(10)
<ipython-input-22-98000fdd3df8> in get_recommendations(title)
13
14 # Sort the movies based on the similarity scores
---> 15 sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
16
17 # Get the scores of the 10 most similar movies
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
【问题讨论】:
您可能需要检查x[1]
的值。这篇博文有很好的解释:akashmittal.com/valueerror-truth-value-array-ambiguous
对您尝试排序的值进行<
类型比较时遇到问题。 sim_scores
是什么,或者你更喜欢 indices
系列的派生?
【参考方案1】:
当有相同的电影标题名称具有不同的索引时,可能会出现此错误,即假设电影标题是“教父”,所以它可能有:
movie title index
The Godfather 0
The Godfather 1
The Godfather 2
所以要克服这个问题,从数据帧中删除重复值,然后运行它必须运行的程序。
new_df.drop_duplicates(subset=['Movie Title'])
【讨论】:
以上是关于NLP/TF-IDF: ValueError: 具有多个元素的数组的真值不明确。使用 a.any() 或 a.all()的主要内容,如果未能解决你的问题,请参考以下文章
如何解决 raise ValueError("columns must have matching element counts") ValueError: columns mus