在主成分分析 (PCA) 图中添加椭圆

Posted

技术标签:

【中文标题】在主成分分析 (PCA) 图中添加椭圆【英文标题】:Adding ellipses to a principal component analysis (PCA) plot 【发布时间】:2012-12-05 20:11:08 【问题描述】:

我无法在单个站点 PCA 因子图的顶部添加分组变量椭圆,该图还包括 PCA 变量因子箭头。

我的代码:

prin_comp<-rda(data[,2:9], scale=TRUE)
pca_scores<-scores(prin_comp)

#sites=individual site PC1 & PC2 scores, Waterbody=Row Grouping Variable.
#site scores in the PCA plot are stratified by Waterbody type.

plot(pca_scores$sites[,1],
     pca_scores$sites[,2],
     pch=21,
     bg=point_colors[data$Waterbody],
     xlim=c(-2,2), 
     ylim=c(-2,2),
     xlab=x_axis_text,
     ylab=y_axis_text)

#species=column PCA1 & PCA2 Response variables
arrows(0,0,pca_scores$species[,1],pca_scores$species[,2],lwd=1,length=0.2)

#I want to draw 'Waterbody' Grouping Variable ellipses that encompass completely, 
# their appropriate individual site scores (this is to visualise total error/variance).

我尝试同时使用 dataellipse、plotellipses 和 ellipse 函数,但无济于事。 无知在这一点上胜出。如果我没有提供足够的信息,请告诉我。

数据(log10 转换):

dput(data)

structure(list(Waterbody = structure(c(4L, 4L, 4L, 4L, 4L, 4L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("Ditch", "Garden Pond", 
"Peri-Urban Ponds", "River", "Stream"), class = "factor"), Catchment_Size = c(9.73045926, 
9.73045926, 9.73045926, 9.73045926, 9.73045926, 9.73045926, 9.73045926, 
9.73045926, 9.73045926, 9.73045926, 9.73045926, 9.73045926, 9.73045926, 
9.73045926, 9.73045926, 9.73045926, 9.73045926, 9.73045926, 9.73045926, 
9.73045926, 8.602059991, 8.602059991, 8.602059991, 8.602059991, 
8.602059991, 8.602059991, 8.602059991, 8.602059991, 8.602059991, 
8.602059991, 8.602059991, 8.602059991, 8.602059991, 8.602059991, 
8.602059991, 8.602059991, 8.602059991, 8.602059991, 8.602059991, 
8.602059991, 5.230525555, 5.271197816, 5.310342762, 5.674064357, 
5.745077916, 5.733059168, 5.90789752, 5.969640923, 0, 0, 0.419955748, 
0, 0.079181246, 0, 0.274157849, 0, 0.301029996, 1, 0.62838893, 
0.243038049, 0, 0, 0, 1.183269844, 0, 1.105510185, 0, 0.698970004, 
2, 1.079181246, 2.954242509, 1.84509804, 1.477121255, 2.477121255, 
3.662757832, 1.397940009, 1.84509804, 0), pH = c(0.888740961, 
0.891537458, 0.890421019, 0.904715545, 0.886490725, 0.88592634, 
0.892651034, 0.891537458, 0.895422546, 0.8876173, 0.881384657, 
0.888179494, 0.876794976, 0.898725182, 0.894316063, 0.882524538, 
0.881384657, 0.916980047, 0.890979597, 0.886490725, 0.88592634, 
0.903089987, 0.889301703, 0.897627091, 0.896526217, 0.890979597, 
0.927370363, 0.904174368, 0.907948522, 0.890979597, 0.910090546, 
0.892094603, 0.896526217, 0.891537458, 0.894869657, 0.894316063, 
0.898725182, 0.914343157, 0.923244019, 0.905256049, 0.870988814, 
0.868644438, 0.872156273, 0.874481818, 0.88422877, 0.876217841, 
0.874481818, 0.8876173, 0.859138297, 0.887054378, 0.856124444, 
0.856124444, 0.860936621, 0.903089987, 0.860338007, 0.8876173, 
0.860338007, 0.906335042, 0.922206277, 0.851869601, 0.862131379, 
0.868056362, 0.869818208, 0.861534411, 0.875061263, 0.852479994, 
0.868644438, 0.898725182, 0.870403905, 0.88422877, 0.867467488, 
0.905256049, 0.88536122, 0.8876173, 0.876794976, 0.914871818, 
0.899820502, 0.946943271), Conductivity = c(2.818885415, 2.824125834, 
2.824776462, 2.829303773, 2.824125834, 2.82672252, 2.829303773, 
2.82672252, 2.824776462, 2.829946696, 2.846337112, 2.862727528, 
2.845718018, 2.848804701, 2.86923172, 2.85308953, 2.867467488, 
2.847572659, 2.86569606, 2.849419414, 2.504606771, 2.506775537, 
2.691346764, 2.628797486, 2.505963518, 2.48756256, 2.501470072, 
2.488973525, 2.457124626, 2.778295991, 2.237040791, 2.429267666, 
2.3287872, 2.461198289, 2.384174139, 2.386320574, 2.410608543, 
2.404320467, 2.426836454, 2.448397103, 2.768704683, 2.76718556, 
2.771602178, 2.775289978, 2.90579588, 2.909020854, 3.007747778, 
3.017867719, 2.287129621, 2.099680641, 2.169674434, 1.980457892, 
2.741781696, 2.597804842, 2.607669437, 2.419129308, 2.786751422, 
2.639884742, 2.19893187, 2.683497318, 2.585235063, 2.393048466, 
2.562411833, 2.785329835, 2.726808683, 2.824776462, 2.699056855, 
2.585122186, 2.84260924, 2.94792362, 2.877371346, 2.352568386, 
2.202760687, 2.819543936, 2.822168079, 2.426348574, 2.495683068, 
2.731266349), NO3 = c(1.366236124, 1.366236124, 1.376029182, 
1.385606274, 1.376029182, 1.385606274, 1.385606274, 1.385606274, 
1.376029182, 1.385606274, 1.458637849, 1.489114369, 1.482158695, 
1.496098992, 1.502290528, 1.50174373, 1.500785173, 1.499549626, 
1.485721426, 1.490520309, 0.693726949, 0.693726949, 1.246005904, 
1.159266331, 0.652246341, 0.652246341, 0.883093359, 0.85672889, 
0.828659897, 1.131297797, 0.555094449, 0.85672889, 0.731588765, 
0.883093359, 0.731588765, 0.731588765, 0.693726949, 0.693726949, 
0.693726949, 0.693726949, 1.278524965, 1.210853365, 1.318480725, 
1.308777774, 1.404833717, 1.412796429, 0, 0, 0, 0, 0, 0, 1.204391332, 
0, 0, 0, 0.804820679, 0, 0, 0.021189299, 0, 0, 0.012837225, 0, 
0, 0, 0, 0.539076099, 0, 0, 1.619406411, 0, 0, 1.380753771, 0, 
0, 0, 0.931966115), NH4 = c(0.14, 0.14, 0.18, 0.19, 0.2, 0.2, 
0.15, 0.14, 0.11, 0.11, 0.04, 0.06, 0.04, 0.03, 0.07, 0.03, 0.03, 
0.04, 0.04, 0.03, 0.01, 0, 0, 0.01, 0.02, 0.02, 0.05, 0.03, 0.04, 
0.02, 0.21, 0.19, 0.2, 0.1, 0.05, 0.05, 0.08, 0.11, 0.04, 0.04, 
0.15, 2.03, 0.14, 0.09, 0.05, 0.04, 2.82, 3.18, 0.06, 0.12, 2.06, 
0.1, 0.14, 0.06, 1.06, 0.03, 0.04, 0.03, 0.03, 1.91, 0.2, 1.35, 
0.69, 0.05, 0.17, 3.18, 0.21, 0.1, 0.03, 1.18, 0.01, 0.03, 0.02, 
0.09, 0.14, 0.02, 0.07, 0.17), SRP = c(0.213348889, 0.221951667, 
0.24776, 0.228833889, 0.232275, 0.249480556, 0.259803889, 0.244318889, 
0.249480556, 0.240877778, 0.314861667, 0.292494444, 0.311420556, 
0.306258889, 0.285612222, 0.323464444, 0.316582222, 0.34067, 
0.285612222, 0.321743889, 0.074328, 0.074328, 0.120783, 0.133171, 
0.0820705, 0.080522, 0.0789735, 0.0820705, 0.080522, 0.0913615, 
0.136268, 0.1656895, 0.1223315, 0.130074, 0.1192345, 0.1285255, 
0.1873685, 0.167238, 0.15485, 0.157947, 0.1378165, 0.1966595, 
0.198208, 0.241566, 0.037164, 0.0325185, 0.455259, 0.560557, 
0.07987, 0.02119, 0.02119, 0.03912, 0.36349, 0.40098, 0.04401, 
0.07172, 0.15322, 0.92421, 0.02282, 0.17604, 0.17767, 0.66667, 
0.28688, 0.03586, 0.17278, 0.07661, 0.10432, 1.12959, 0.0170335, 
0.0975555, 0.009291, 0.0263245, 0.037164, 0.2214355, 0.0449065, 
0.068134, 0.09291, 0.545072), Zn = c(0.802630077, 1.172124009, 
0.891565332, 0.600253919, 0.583912562, 0.962473516, 0.99881711, 
0.709787074, 1.139860204, 0.953730706, 0.945832806, 0.906270378, 
0.81663232, 0.912514323, 0.935073763, 1.032328597, 1.357197063, 
1.070662063, 0.51200361, 0.987514325, 1.433709044, 1.380974206, 
1.143661074, 0.999774108, 1.449654241, 1.366165106, 1.014239038, 
0.891258617, 0.703978825, 1.086487964, 1.503432481, 1.243241499, 
0.890504851, 0.291391053, 0, 0.802855789, 0.776316103, 0.927421695, 
0.421505212, 0.952099537, 0.688802331, 0.852504392, 0.773545103, 
1.006581553, 1.028229538, 0.880619259, 0.833408503, 1.038608242, 
1.107084413, 0.973967909, 2.135781222, 1.819197019, 1.629353525, 
1.163194184, 1.343286462, 1.273614642, 1.92374902, 1.70523233, 
1.377623112, 1.119971423, 1.461175762, 1.691856516, 1.661826878, 
1.104531494, 1.449455257, 1.092376721, 1.519029523, 1.553407226, 
1.52652924, 1.332876573, 1.293079563, 0.996734891, 1.590475126, 
1.525755949, 1.180418366, 0.712624451, 0.6739512, 0.585043155
), Mn = c(0.817367016, 0.799340549, 1.023910606, 1.012921546, 
0.821579028, 1.321888278, 1.115077717, 1.02031984, 1.135482491, 
1.073645046, 1.016866271, 1.052809328, 0.818423855, 0.836387419, 
1.151032582, 0.720490068, 1.03746634, 1.072580733, 1.041590047, 
0.979548375, 1.073168262, 1.134336511, 0.916137983, 0.641374945, 
1.083753378, 0.84441504, 0.547159121, 0.144262774, 1.084826417, 
0.674861141, 0.478566496, 1.211654401, 1.095518042, 0.387033701, 
0.647480773, 0.775828814, 0.533899101, 0.854548936, 0.755188586, 
0.714497409, 0.851808514, 0.390051496, 0.832508913, 1.222482357, 
1.477048866, 1.475147977, 2.127826941, 2.132205239, 1.639576128, 
1.155578931, 2.203783274, 1.148448404, 1.644586284, 1.122609024, 
1.577319427, 1.633417953, 1.583901241, 1.215478936, 1.135418905, 
1.612847407, 1.95593777, 1.783639208, 1.567837703, 2.251767151, 
0.992155711, 1.738923187, 0.681964459, 0.852845818, 1.77749932, 
2.465019796, 0.887729797, 0.610447221, 1.777760209, 1.034588354, 
0.303196057, 1.793371249, 1.677734668, 1.802157753)), .Names = c("Waterbody", 
"Catchment_Size", "pH", "Conductivity", "NO3", "NH4", "SRP", 
"Zn", "Mn"), class = "data.frame", row.names = c("1_1", "1_2", 
"1_3", "1_4", "1_5", "1_6", "1_7", "1_8", "1_9", "1_10", "1_11", 
"1_12", "1_13", "1_14", "1_15", "1_16", "1_17", "1_18", "1_19", 
"1_20", "2_1", "2_2", "2_3", "2_4", "2_5", "2_6", "2_7", "2_8", 
"2_9", "2_10", "2_11", "2_12", "2_13", "2_14", "2_15", "2_16", 
"2_17", "2_18", "2_19", "2_20", "3_1", "3_2", "3_3", "3_4", "3_5", 
"3_6", "3_7", "3_8", "4_1", "4_2", "4_3", "4_4", "4_5", "4_6", 
"4_7", "4_8", "4_9", "4_10", "4_11", "4_12", "4_13", "4_14", 
"4_15", "4_16", "4_17", "4_18", "4_19", "4_20", "5_1", "5_2", 
"5_3", "5_4", "5_5", "5_6", "5_7", "5_8", "5_9", "5_10"))

【问题讨论】:

欢迎来到 ***!您能否通过dput(data) 显示您的数据的可重现版本? (如果您的数据很大,您可能希望 dput 只是行的一个子集) 另外,这是来自rda package 的rda 包吗?如果是这样,scale 参数来自哪里(它似乎在该函数中不可用)。 我是新手,所以请原谅我的礼仪大卫 这根本不是问题,刘易斯——只是想确保我们有需要帮助的东西! 我下载了 BiodiversityR 包 【参考方案1】:

由于您在问题中没有提到这一点,我假设您使用的包是vegan,因为它具有接受scale=TRUE 参数的函数rda()

您最初的plot() 调用已被修改,因为某些变量没有给出。

library(vegan)
prin_comp<-rda(data[,2:9], scale=TRUE)
pca_scores<-scores(prin_comp)

plot(pca_scores$sites[,1],
     pca_scores$sites[,2],
     pch=21,
     bg=as.numeric(data$Waterbody),
     xlim=c(-2,2), 
     ylim=c(-2,2))
arrows(0,0,pca_scores$species[,1],pca_scores$species[,2],lwd=1,length=0.2)

要制作省略号,请使用包vegan 的函数ordiellipse()。必须提供 PCA 分析对象和分组变量作为参数。要控制椭圆中包含的点数,可以使用参数conf=

ordiellipse(prin_comp,data$Waterbody,conf=0.99)

【讨论】:

我还建议您探索图书馆素食主义者,因为它具有绘制 PCA 结果的功能。 谢谢迪兹。这正是我一直在寻找的。我相信安装 BiodiversityR 包时会自动下载 vegan 包。 非常感谢大家的意见。 *** 是一个很棒的发现 是的,BiodiversityR 取决于素食主义者(≥ 1.17-12)。 @Lewis,如果您正在寻找答案,请考虑投票并接受答案(如here 所解释)【参考方案2】:

这是一个ggplot 解决方案,使用漂亮的ggbiplot 库。对plot 的一个明显改进是这个标签上的标签。

library(devtools) # don't forget to install Rtools first
install_github("vqv/ggbiplot")

library(ggbiplot)
data.class <- data[,1]
data.pca <- prcomp(data[,2:9], scale. = TRUE)
g <- ggbiplot(data.pca, obs.scale = 1, var.scale = 1, 
              groups = data.class, ellipse = TRUE, circle = TRUE)
g <- g + scale_color_discrete(name = '')
g <- g + theme(legend.direction = 'horizontal', 
              legend.position = 'top')
print(g)

【讨论】:

你知道如何加粗箭头和日食大小以及如何在ggbiplot PCA 中添加repel 吗?【参考方案3】:

只是添加这个,因为它可以帮助新用户:

如果你的分组数据是分类的,你必须使用as.factor 否则你会得到:

(错误:必须使用[ 中的向量,而不是类矩阵的对象。)

改为:

data.pca <- prcomp(dataPCA[,2:4], scale. = TRUE)
g <- ggbiplot(data.pca, obs.scale = 1, var.scale = 1, 
groups = as.factor(dataPCA$Gender), ellipse = TRUE, circle = TRUE)
g <- g + scale_color_discrete(name = '')
g <- g + theme(legend.direction = 'horizontal', legend.position = 'top')
print(g)

【讨论】:

以上是关于在主成分分析 (PCA) 图中添加椭圆的主要内容,如果未能解决你的问题,请参考以下文章

用scikit-learn学习主成分分析(PCA)

主成分分析(PCA)与线性判别分析(LDA)

降维——PCA主成分分析

主成分分析PCA

主成分分析(PCA)

详解主成分分析PCA