Error in na.fail.default(list(Purchase = c(“CH“, “CH“, “CH“, “MM“, “CH“, : missing values in obj
Posted Data+Science+Insight
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Error in na.fail.default(list(Purchase = c(“CH“, “CH“, “CH“, “MM“, “CH“, : missing values in obj相关的知识,希望对你有一定的参考价值。
Error in na.fail.default(list(Purchase = c("CH", "CH", "CH", "MM", "CH", : missing values in object
目录
Error in na.fail.default(list(Purchase = c("CH", "CH", "CH", "MM", "CH", : missing values in object
问题:
数据有缺失值,训练发生错误
#
# install.packages(c('caret', 'skimr', 'RANN', 'randomForest', 'fastAdaboost', 'gbm', 'xgboost', 'caretEnsemble', 'C50', 'earth'))
# Load the caret package
library(caret)
# Import dataset
orange <- read.csv('https://raw.githubusercontent.com/selva86/datasets/master/orange_juice_withmissing.csv')
# Structure of the dataframe
str(orange)
# See top 6 rows and 10 columns
head(orange[, 1:10])
# Create the training and test datasets
set.seed(100)
# Step 1: Get row numbers for the training data
trainRowNumbers <- createDataPartition(orange$Purchase, p=0.8, list=FALSE)
# Step 2: Create the training dataset
trainData <- orange[trainRowNumbers,]
# Step 3: Create the test dataset
testData <- orange[-trainRowNumbers,]
# Store X and Y for later use.
x = trainData[, 2:18]
y = trainData$Purchase
# Define the training control
fitControl <- trainControl(
method = 'cv', # k-fold cross validation
number = 5, # number of folds
savePredictions = 'final', # saves predictions for optimal tuning parameter
classProbs = T, # should class probabilities be returned
summaryFunction=twoClassSummary # results summary function
)
# Step 1: Tune hyper parameters by setting tuneLength
set.seed(100)
model_mars2 = train(Purchase ~ ., data=trainData, method='earth', tuneLength = 5, metric='ROC', trControl = fitControl)
model_mars2
解决:
#数据预处理-缺失值
preProcess_missingdata_model <- preProcess(trainData, method='knnImpute')
preProcess_missingdata_model
library(RANN) # required for knnInpute
trainData <- predict(preProcess_missingdata_model, newdata = trainData)
anyNA(trainData)
#
# install.packages(c('caret', 'skimr', 'RANN', 'randomForest', 'fastAdaboost', 'gbm', 'xgboost', 'caretEnsemble', 'C50', 'earth'))
# Load the caret package
library(caret)
# Import dataset
orange <- read.csv('https://raw.githubusercontent.com/selva86/datasets/master/orange_juice_withmissing.csv')
# Structure of the dataframe
str(orange)
# See top 6 rows and 10 columns
head(orange[, 1:10])
# Create the training and test datasets
set.seed(100)
# Step 1: Get row numbers for the training data
trainRowNumbers <- createDataPartition(orange$Purchase, p=0.8, list=FALSE)
# Step 2: Create the training dataset
trainData <- orange[trainRowNumbers,]
# Step 3: Create the test dataset
testData <- orange[-trainRowNumbers,]
# Store X and Y for later use.
x = trainData[, 2:18]
y = trainData$Purchase
#数据预处理-缺失值
preProcess_missingdata_model <- preProcess(trainData, method='knnImpute')
preProcess_missingdata_model
library(RANN) # required for knnInpute
trainData <- predict(preProcess_missingdata_model, newdata = trainData)
anyNA(trainData)
# Define the training control
fitControl <- trainControl(
method = 'cv', # k-fold cross validation
number = 5, # number of folds
savePredictions = 'final', # saves predictions for optimal tuning parameter
classProbs = T, # should class probabilities be returned
summaryFunction=twoClassSummary # results summary function
)
# Step 1: Tune hyper parameters by setting tuneLength
set.seed(100)
model_mars2 = train(Purchase ~ ., data=trainData, method='earth', tuneLength = 5, metric='ROC', trControl = fitControl)
model_mars2
> model_mars2
Multivariate Adaptive Regression Spline
857 samples
17 predictor
2 classes: 'CH', 'MM'
No pre-processing
Resampling: Cross-Validated (5 fold)
Summary of sample sizes: 685, 686, 685, 686, 686
Resampling results across tuning parameters:
nprune ROC Sens Spec
2 0.8837092 0.8757143 0.7094075
5 0.9045666 0.8756960 0.7483944
9 0.8958465 0.8776190 0.7483039
13 0.8942303 0.8719048 0.7513342
17 0.8942303 0.8719048 0.7513342
Tuning parameter 'degree' was held constant at a value of 1
ROC was used to select the optimal model using the largest value.
The final values used for the model were nprune = 5 and degree = 1.
>
完整错误:
> # Step 2: Predict on testData and Compute the confusion matrix
> predicted2 <- predict(model_mars2, testData4)
Error in predict.train(model_mars2, testData4) :
object 'testData4' not found
> confusionMatrix(reference = testData$Purchase, data = predicted2, mode='everything', positive='MM')
Error in confusionMatrix(reference = testData$Purchase, data = predicted2, :
object 'predicted2' not found
>
>
以上是关于Error in na.fail.default(list(Purchase = c(“CH“, “CH“, “CH“, “MM“, “CH“, : missing values in obj的主要内容,如果未能解决你的问题,请参考以下文章
# Error in colSums(iris) : ‘x‘ must be numeric,# Error in rowSums(iris) : ‘x‘ must be numeric
1064 - syntax error, error in :'Id`) USING BTREE
ERROR 1064 (42000): You have an error in your SQL syntax...
Error in eval(family$initialize) : y值必需满足0 <= y <= 1Error in eval(family$initialize) : y values mus