Error in na.fail.default(list(Purchase = c(“CH“, “CH“, “CH“, “MM“, “CH“, : missing values in obj

Posted Data+Science+Insight

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Error in na.fail.default(list(Purchase = c(“CH“, “CH“, “CH“, “MM“, “CH“, : missing values in obj相关的知识,希望对你有一定的参考价值。

Error in na.fail.default(list(Purchase = c("CH", "CH", "CH", "MM", "CH",  :  missing values in object

目录

Error in na.fail.default(list(Purchase = c("CH", "CH", "CH", "MM", "CH",  :  missing values in object

问题:

解决:

完整错误:


问题:

数据有缺失值,训练发生错误

#

# install.packages(c('caret', 'skimr', 'RANN', 'randomForest', 'fastAdaboost', 'gbm', 'xgboost', 'caretEnsemble', 'C50', 'earth'))
 
# Load the caret package
library(caret)
 
# Import dataset
orange <- read.csv('https://raw.githubusercontent.com/selva86/datasets/master/orange_juice_withmissing.csv')
 
# Structure of the dataframe
str(orange)
 
# See top 6 rows and 10 columns
head(orange[, 1:10])


# Create the training and test datasets
set.seed(100)
 
# Step 1: Get row numbers for the training data
trainRowNumbers <- createDataPartition(orange$Purchase, p=0.8, list=FALSE)
 
# Step 2: Create the training  dataset
trainData <- orange[trainRowNumbers,]
 
# Step 3: Create the test dataset
testData <- orange[-trainRowNumbers,]
 
# Store X and Y for later use.
x = trainData[, 2:18]
y = trainData$Purchase

 
# Define the training control
fitControl <- trainControl(
    method = 'cv',                   # k-fold cross validation
    number = 5,                      # number of folds
    savePredictions = 'final',       # saves predictions for optimal tuning parameter
    classProbs = T,                  # should class probabilities be returned
    summaryFunction=twoClassSummary  # results summary function
) 


# Step 1: Tune hyper parameters by setting tuneLength
set.seed(100)
model_mars2 = train(Purchase ~ ., data=trainData, method='earth', tuneLength = 5, metric='ROC', trControl = fitControl)
model_mars2

解决:

#数据预处理-缺失值

preProcess_missingdata_model <- preProcess(trainData, method='knnImpute')
preProcess_missingdata_model

library(RANN)  # required for knnInpute
trainData <- predict(preProcess_missingdata_model, newdata = trainData)
anyNA(trainData)
 

#

# install.packages(c('caret', 'skimr', 'RANN', 'randomForest', 'fastAdaboost', 'gbm', 'xgboost', 'caretEnsemble', 'C50', 'earth'))
 
# Load the caret package
library(caret)
 
# Import dataset
orange <- read.csv('https://raw.githubusercontent.com/selva86/datasets/master/orange_juice_withmissing.csv')
 
# Structure of the dataframe
str(orange)
 
# See top 6 rows and 10 columns
head(orange[, 1:10])


# Create the training and test datasets
set.seed(100)
 
# Step 1: Get row numbers for the training data
trainRowNumbers <- createDataPartition(orange$Purchase, p=0.8, list=FALSE)
 
# Step 2: Create the training  dataset
trainData <- orange[trainRowNumbers,]
 
# Step 3: Create the test dataset
testData <- orange[-trainRowNumbers,]
 
# Store X and Y for later use.
x = trainData[, 2:18]
y = trainData$Purchase



#数据预处理-缺失值

preProcess_missingdata_model <- preProcess(trainData, method='knnImpute')
preProcess_missingdata_model

library(RANN)  # required for knnInpute
trainData <- predict(preProcess_missingdata_model, newdata = trainData)
anyNA(trainData)


 
# Define the training control
fitControl <- trainControl(
    method = 'cv',                   # k-fold cross validation
    number = 5,                      # number of folds
    savePredictions = 'final',       # saves predictions for optimal tuning parameter
    classProbs = T,                  # should class probabilities be returned
    summaryFunction=twoClassSummary  # results summary function
) 


# Step 1: Tune hyper parameters by setting tuneLength
set.seed(100)
model_mars2 = train(Purchase ~ ., data=trainData, method='earth', tuneLength = 5, metric='ROC', trControl = fitControl)
model_mars2

> model_mars2
Multivariate Adaptive Regression Spline 

857 samples
 17 predictor
  2 classes: 'CH', 'MM' 

No pre-processing
Resampling: Cross-Validated (5 fold) 
Summary of sample sizes: 685, 686, 685, 686, 686 
Resampling results across tuning parameters:

  nprune  ROC        Sens       Spec     
   2      0.8837092  0.8757143  0.7094075
   5      0.9045666  0.8756960  0.7483944
   9      0.8958465  0.8776190  0.7483039
  13      0.8942303  0.8719048  0.7513342
  17      0.8942303  0.8719048  0.7513342

Tuning parameter 'degree' was held constant at a value of 1
ROC was used to select the optimal model using the largest value.
The final values used for the model were nprune = 5 and degree = 1.
>

完整错误:

> # Step 2: Predict on testData and Compute the confusion matrix
> predicted2 <- predict(model_mars2, testData4)
Error in predict.train(model_mars2, testData4) : 
  object 'testData4' not found
> confusionMatrix(reference = testData$Purchase, data = predicted2, mode='everything', positive='MM')
Error in confusionMatrix(reference = testData$Purchase, data = predicted2,  : 
  object 'predicted2' not found

以上是关于Error in na.fail.default(list(Purchase = c(“CH“, “CH“, “CH“, “MM“, “CH“, : missing values in obj的主要内容,如果未能解决你的问题,请参考以下文章

[RxJS] Error Handling in RxJS

# Error in colSums(iris) : ‘x‘ must be numeric,# Error in rowSums(iris) : ‘x‘ must be numeric

1064 - syntax error, error in :'Id`) USING BTREE

ERROR 1064 (42000): You have an error in your SQL syntax...

Error in eval(family$initialize) : y值必需满足0 <= y <= 1Error in eval(family$initialize) : y values mus

Parse error: syntax error, unexpected '(' in