Python异常样本识别 交叉验证出现错误?

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Python异常样本识别 交叉验证出现错误?相关的知识,希望对你有一定的参考价值。

源码:
from sklearn.model_selection import KFold
from sklearn.metrics import recall_score
fold = KFold(5, shuffle=False)
recall_accs = []
c_param_range = [0.01, 0.1, 1, 10, 100]
results_table = pd.DataFrame(columns = ['C值', '平均召回率得分'])
results_table['C值'] = c_param_range
j = 0
for c_param in c_param_range:
print('C值:',c_param)
recall_accs = []
for iteration, indices in enumerate(fold.split(y_train)):
lr = LR(C = c_param, penalty = '12')
X_train = X_train.reset_index(drop=True)
y_train = y_train.reset_index(drop=True)
lr.fit(X_train.iloc[indices[0],:], y_train.iloc[indices[0]].values.ravel())
y_pred = lr.predict(X_train.iloc[indices[1],:].values)
recall_acc = recall_score(y_train.iloc[indices[1]].values, y_pred)
recall_accs.append(recall_acc)
print('迭代次数', iteration, ':召回率得分=', recall_acc)
results_table.ix[j, '平均召回率得分'] = np.mean(recall_accs)
j += 1
print('平均召回率得分:', np.mean(recall_accs))
best_c = results_table.loc[results_table['平均召回率得分'].idxmax()]['C值']
print('交叉验证最好的C值是',best_c)

错误提示:ValueError Traceback (most recent call last)
<ipython-input-16-dd737f3979b5> in <module>()
14 X_train = X_train.reset_index(drop=True)
15 y_train = y_train.reset_index(drop=True)
---> 16 lr.fit(X_train.iloc[indices[0],:], y_train.iloc[indices[0]].values.ravel())
17 y_pred = lr.predict(X_train.iloc[indices[1],:].values)
18 recall_acc = recall_score(y_train.iloc[indices[1]].values, y_pred)

D:\Anaconda3\lib\site-packages\sklearn\linear_model\logistic.py in fit(self, X, y, sample_weight)
1491 The SAGA solver supports both float64 and float32 bit arrays.
1492 """
-> 1493 solver = _check_solver(self.solver, self.penalty, self.dual)
1494
1495 if not isinstance(self.C, numbers.Number) or self.C < 0:

D:\Anaconda3\lib\site-packages\sklearn\linear_model\logistic.py in _check_solver(solver, penalty, dual)
440 if penalty not in all_penalties:
441 raise ValueError("Logistic Regression supports only penalties in %s,"
--> 442 " got %s." % (all_penalties, penalty))
443
444 if solver not in ['liblinear', 'saga'] and penalty not in ('l2', 'none'):

ValueError: Logistic Regression supports only penalties in ['l1', 'l2', 'elasticnet', 'none'], got 12.

求大神帮忙。

参考技术A penalty = '12' 这个参数不对,只有L1和L2正则,打错了吧

python实现jacknife交叉验证

留一法 是指只使用原样本中的一个样本作为验证集,其他数据作为训练集。本质上,留一法 与 Jackknife 并无区别。因此下面给出jacknife(留一法)的实现代码。

理论解释:
Jackknife 方法由 Quenouille(1949) 提出,并由 Tukey(1958) 创造了 Jackkife 这一术语。Jackknife 是一种再抽样方法,其原始动机是「降低估计的偏差」。
具体来看,对于未知分布的总体,从中抽取样本容量为N的样本,以样本统计量a来估计总体参数A会产生一定误差,尤其在小样本的情况下。为解决这样一个问题,可以将从原样本切去第 i个个体后计算得到的统计量记为xi (也就是去除某一个xi后,用剩余的X作为训练集,将xi作为验证集)。

具体代码如下:

    prediction_list=[]
    real_list=[]
    ####LOOCV
    loo = LeaveOneOut()  # 构建 留一法
    loo.get_n_splits(x_data)
    for train_index, test_index in loo.split(x_data):
        X_train, X_test = x_data[train_index], x_data[test_index]
        y_train, y_test = labels[train_index], labels[test_index]
        knn = KNeighborsClassifier(n_neighbors=1).fit(X_train, y_train)
        predicted_y = knn.predict(X_test)
        prediction_list.append(predicted_y)
        real_list.append(y_test)

上面就实现一个KNN分类的留一法过程,其它都是相似的过程,只要稍作修改就可以得到应用。

以上是关于Python异常样本识别 交叉验证出现错误?的主要内容,如果未能解决你的问题,请参考以下文章

python 创建用于交叉验证的样本

KNN分类器(十折交叉验证)

使用sklearn进行交叉验证

对交叉验证的认识

python实现jacknife交叉验证

交叉验证