如何提高朴素贝叶斯分类器视觉的 R 图的质量/图形
Posted
技术标签:
【中文标题】如何提高朴素贝叶斯分类器视觉的 R 图的质量/图形【英文标题】:How can I improve the quality/graphics of my R plot for a Naive Bayes classifier visual 【发布时间】:2021-07-27 23:22:39 【问题描述】:我尝试了一个朴素贝叶斯分类器,看看我是否可以根据一个人的年龄和估计工资来预测他们是否会购买特定的车辆。我在可视化部分得到的图看起来不是很平滑和干净,我的图上有白线。我假设图形/分辨率是问题,但我不确定。
这是数据集外观的 sn-p
Age EstimatedSalary Purchased
19 19000 0
35 20000 0
26 43000 0
27 57000 0
19 76000 0
27 58000 0
这里是代码
# Loading the data set
data <- read.csv(" *A csv sheet on people's age, salaries and whether or not they will purchase a certain vehicle* ")
data <- data[, 3:5]
attach(data)
# Encoding the dependent variable
data$Purchased <- factor(data$Purchased, levels = c(0, 1))
attach(data)
# Splitting the dataset
library(caTools)
set.seed(404)
split <- sample.split(Purchased, SplitRatio = 0.75)
train_set <- subset(data, split == T)
test_set <- subset(data, split == F)
# Feature scaling
train_set[-3] <- scale(train_set[-3])
test_set[-3] <- scale(test_set[-3])
# Training the model
library(e1071)
classifier <- naiveBayes(x = train_set[-3], y = train_set$Purchased)
# Predicting test results
y_pred <- predict(classifier, newdata = test_set[-3])
# Construct the confusion matrix
(cm <- table(test_set[, 3], y_pred))
下面是我用来可视化结果的代码
# Visualising the results
library(ElemStatLearn)
set <- test_set
x1 <- seq(min(set[, 1]) - 1, max(set[, 1]) + 1, by = 0.01)
x2 <- seq(min(set[, 2]) - 1, max(set[, 2]) + 1, by = 0.01)
grid_set <- expand.grid(x1, x2)
colnames(grid_set) <- c("Age", "EstimatedSalary")
y_grid <- predict(classifier, newdata = grid_set)
plot(set[, -3], main = "Naive Bayes: Test set", xlab = "Age", ylab = "EstimatedSalary", xlim = range(x1), ylim = range(x2))
contour(x1, x2, matrix(as.numeric(y_grid), length(x1), length(x2)), add = T)
points(grid_set, pch = ".", col = ifelse(y_grid == 1, "Springgreen3", "tomato"))
points(set, pch = 21, bg = ifelse(set[, 3] == 1, "green4", "red3"))
Naive Bayes classifier plot on the test set predictions
想知道白线在情节上上下跑的原因以及为什么看起来不流畅?
【问题讨论】:
【参考方案1】:所以我弄清楚是什么给了我奇怪的线条和低质量的分辨率。将“cex = n”参数添加到图中的“points()”函数中,n = 5 解决了这个问题。
修改后的代码块
set <- test_set
x1 <- seq(min(set[, 1]) - 1, max(set[, 1]) + 1, by = 0.01)
x2 <- seq(min(set[, 2]) - 1, max(set[, 2]) + 1, by = 0.01)
grid_set <- expand.grid(x1, x2)
colnames(grid_set) <- c("Age", "EstimatedSalary")
y_grid <- predict(classifier, newdata = grid_set)
plot(set[, -3], main = "Naive Bayes: Test set", xlab = "Age", ylab = "EstimatedSalary", xlim = range(x1), ylim = range(x2))
contour(x1, x2, matrix(as.numeric(y_grid), length(x1), length(x2)), add = T)
points(grid_set, pch = ".", col = ifelse(y_grid == 1, "Springgreen3", "tomato"), cex = 5)
points(set, pch = 21, bg = ifelse(set[, 3] == 1, "green4", "red3"))
上述代码块中修改后的代码行
points(grid_set, pch = ".", col = ifelse(y_grid == 1, "Springgreen3", "tomato"), cex = 5)
但是,我仍然想知道发生这种情况的原因,因为 R 中关于函数和参数的解释对我来说不是很清楚。
将不胜感激任何帮助!
【讨论】:
以上是关于如何提高朴素贝叶斯分类器视觉的 R 图的质量/图形的主要内容,如果未能解决你的问题,请参考以下文章
如何在 R 中为 tf-idf 加权 dfm 训练朴素贝叶斯分类器?