如何在python中从头开始获取kfold拆分以进行交叉验证?
Posted
技术标签:
【中文标题】如何在python中从头开始获取kfold拆分以进行交叉验证?【英文标题】:How to get kfold splits for cross validation from scratch in python? 【发布时间】:2018-07-25 05:48:12 【问题描述】:我认为我已经将我的训练数据分成 5 kold,有没有办法让我标记/识别 5 个拆分中的每一个,以便我可以将每个拆分发送到我的算法中以计算它们自己的准确度?
from sklearn.model_selection import KFold
kf = KFold(n_splits=5)
splits=kf.get_n_splits(X_train)
print(splits)
另外,我还尝试拆分我的数据,然后在我的逻辑回归中运行,但这会输出 nan % 准确度:
X_train1 = X[0:84]
Y_train1 = Y[0:84]
X_train2 = X[85:170]
Y_train2 = Y[85:170]
X_train3 = X[171:255]
Y_train3 = Y[171:255]
X_train4 = X[256:340]
Y_train4 = Y[256:340]
X_train5 = X[341:426]
Y_train5 = Y[341:426]
def Sigmoid(z):
return 1/(1 + np.exp(-z))
def Hypothesis(theta, x):
return Sigmoid(x @ theta)
def Cost_Function(X,Y,theta,m):
hi = Hypothesis(theta, x)
_y = Y.reshape(-1, 1)
J = 1/float(m) * np.sum(-_y * np.log(hi) - (1-_y) * np.log(1-hi))
return J
def Cost_Function_Regularisation(X,Y,theta,m,alpha):
hi = Hypothesis(theta,X)
_y = Y.reshape(-1, 1)
J = alpha/float(m) * X.T @ (hi - _y)
return J
def Cost_Function_Regularisation(X,Y,theta,m,alpha):
hi = Hypothesis(theta,X)
_y = Y.reshape(-1, 1)
J = alpha/float(m) * X.T @ (hi - _y)
return J
def Gradient_Descent(X,Y,theta,m,alpha):
new_theta = theta - Cost_Function_Regularisation(X,Y,theta,m,alpha)
return new_theta
def Accuracy(theta):
correct = 0
length = len(X_test)
prediction = (Hypothesis(theta, X_test) > 0.5)
_y = Y_test.reshape(-1, 1)
correct = prediction == _y
my_accuracy = (np.sum(correct) / length)*100
print ('LR Accuracy CV: ', my_accuracy, "%")
def Logistic_Regression(X,Y,alpha,theta,num_iters):
m = len(Y)
for x in range(num_iters):
new_theta = Gradient_Descent(X,Y,theta,m,alpha)
theta = new_theta
if x % 100 == 0:
print #('theta: ', theta)
print #('cost: ', Cost_Function(X,Y,theta,m))
Accuracy(theta)
ep = .012
initial_theta = np.random.rand(X_train.shape[1],1) * 2 * ep - ep
alpha = 0.5
iterations = 10000
Logistic_Regression(X_train1,Y_train1,alpha,initial_theta,iterations)
Logistic_Regression(X_train2,Y_train2,alpha,initial_theta,iterations)
Logistic_Regression(X_train3,Y_train3,alpha,initial_theta,iterations)
Logistic_Regression(X_train4,Y_train4,alpha,initial_theta,iterations)
Logistic_Regression(X_train5,Y_train5,alpha,initial_theta,iterations
【问题讨论】:
【参考方案1】:get_n_splits
返回您为 skf 配置的“拆分次数”。
查看此处的文档以获取示例:http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.StratifiedKFold.html
【讨论】:
以上是关于如何在python中从头开始获取kfold拆分以进行交叉验证?的主要内容,如果未能解决你的问题,请参考以下文章
使用KFold进行训练集和验证集的拆分,使用准确率和召回率来挑选合适的阈值(threshold) 1.KFold(进行交叉验证) 2.np.logical_and(两bool数组都是正即为正)
python 参考:[如何在Python中从头开始实现随机森林](http://machinelearningmastery.com/implement-random-forest-scratch-p