SelectKBest 将分数作为 nan 值
Posted
技术标签:
【中文标题】SelectKBest 将分数作为 nan 值【英文标题】:The SelectKBest gives the scores as nan values 【发布时间】:2019-09-21 01:33:14 【问题描述】:我有一个数据集,我正在尝试使用 SelectKBest
和 Chi2
获取特征重要性,但 SelectKBest
给出的特征分数为 nan
。
数据文件和代码文件存在于this链接
# Path to the data file
file_path = r"D:\Data_Sets\Mobile_Prices\data.csv"
# Reading the data from the Southern Second Order file, and also passing the column names to south_data data frame
south_data = pd.read_csv(file_path)
# Printing the number of data points and the number of columns of south_data data frame
print("The number of data points in the data :", south_data.shape[0])
print("The features of the data :", south_data.shape[1])
# Printing the head of south_data data frame
print(south_data.head())
# Check for the nulls
print(south_data.isnull().sum())
# Separate the x and y
x = south_data.drop("tss", axis = 1)
y = south_data["tss"]
# Find the scores of features
bestfit = SelectKBest(score_func=chi2, k=5)
features = bestfit.fit(x,y)
x_new = features.transform(x)
print(features.scores_)
# The output of features.scores_ is displayed as
# array([nan, nan, nan, nan, nan, nan, nan, nan, nan])
【问题讨论】:
【参考方案1】:目标变量中的所有值都是1
。这就是nan
值在您的scores_
中的原因。因此,请验证您的目标变量。
仅作说明:
>>> from sklearn.datasets import load_digits
import numpy as np
>>> from sklearn.feature_selection import SelectKBest, chi2
>>> X, y = load_digits(return_X_y=True)
>>> X.shape
(1797, 64)
>>> feature_selector = SelectKBest(chi2, k=20)
>>> X_new = feature_selector.fit_transform(X, np.ones(len(X)) )
>>> feature_selector.scores_
array([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan])
【讨论】:
感谢ai_learning。我现在已经改变了我的 y 并且它有效。再次感谢您指出我的错误。【参考方案2】:'bestfit' 是一个对象,调用 fit 方法时不需要为其分配变量。试试:
# Find the scores of features
bestfit = SelectKBest(score_func=chi2, k=5)
bestfit.fit(x,y)
x_new = bestfit.transform(x)
print(bestfit.scores_)
或者,您可以同时调用 fit 和 transform:
# Find the scores of features
bestfit = SelectKBest(score_func=chi2, k=5)
x_new = bestfit.fit_transform(x)
print(bestfit.scores_)
这能解决你的问题吗?
【讨论】:
嗨,马特,我尝试了上述两种解决方案,但都没有奏效。两种解决方案的输出为 [nan nan nan nan nan nan nan nan]以上是关于SelectKBest 将分数作为 nan 值的主要内容,如果未能解决你的问题,请参考以下文章
如何解决:“FitFailedWarning:估计器拟合失败。这些参数的训练测试分区上的分数将设置为 nan?”
FitFailedWarning:估计器拟合失败。当使用更大的 int 值时,这些参数的训练测试分区上的分数将设置为 nan