如何绘制集群的集群内平方和图?

Posted

技术标签:

【中文标题】如何绘制集群的集群内平方和图?【英文标题】:How to draw the plot of within-cluster sum-of-squares for a cluster? 【发布时间】:2014-11-15 13:14:31 【问题描述】:

我有一个 R 的聚类图,而我想用 wss 图优化聚类的“肘标准”,但我不知道如何为给定聚类绘制 wss 图,有人会帮助我吗?

这是我的数据:

Friendly<-c(0.467,0.175,0.004,0.025,0.083,0.004,0.042,0.038,0,0.008,0.008,0.05,0.096)
Polite<-c(0.117,0.55,0,0,0.054,0.017,0.017,0.017,0,0.017,0.008,0.104,0.1)
Praising<-c(0.079,0.046,0.563,0.029,0.092,0.025,0.004,0.004,0.129,0,0,0,0.029)
Joking<-c(0.125,0.017,0.054,0.383,0.108,0.054,0.013,0.008,0.092,0.013,0.05,0.017,0.067)
Sincere<-c(0.092,0.088,0.025,0.008,0.383,0.133,0.017,0.004,0,0.063,0,0,0.188)
Serious<-c(0.033,0.021,0.054,0.013,0.2,0.358,0.017,0.004,0.025,0.004,0.142,0.021,0.108)
Hostile<-c(0.029,0.004,0,0,0.013,0.033,0.371,0.363,0.075,0.038,0.025,0.004,0.046)
Rude<-c(0,0.008,0,0.008,0.017,0.075,0.325,0.313,0.004,0.092,0.063,0.008,0.088)
Blaming<-c(0.013,0,0.088,0.038,0.046,0.046,0.029,0.038,0.646,0.029,0.004,0,0.025)
Insincere<-c(0.075,0.063,0,0.013,0.096,0.017,0.021,0,0.008,0.604,0.004,0,0.1)
Commanding<-c(0,0,0,0,0,0.233,0.046,0.029,0.004,0.004,0.538,0,0.146)
Suggesting<-c(0.038,0.15,0,0,0.083,0.058,0,0,0,0.017,0.079,0.133,0.442)
Neutral<-c(0.021,0.075,0.017,0,0.033,0.042,0.017,0,0.033,0.017,0.021,0.008,0.717)

data <- data.frame(Friendly,Polite,Praising,Joking,Sincere,Serious,Hostile,Rude,Blaming,Insincere,Commanding,Suggesting,Neutral)

这是我的聚类代码:

cor <- cor (data)
dist<-dist(cor)
hclust<-hclust(dist)
plot(hclust)

运行上面的代码后我会得到一个树状图,而我怎样才能画出这样的图:

【问题讨论】:

【参考方案1】:

如果我按照你想要的,那么我们需要一个函数来计算 WSS

wss <- function(d) 
  sum(scale(d, scale = FALSE)^2)

以及这个wss() 函数的包装器

wrap <- function(i, hc, x) 
  cl <- cutree(hc, i)
  spl <- split(x, cl)
  wss <- sum(sapply(spl, wss))
  wss

这个包装器接受以下参数,输入:

i 将数据切割成的簇数 hc层次聚类分析对象 x原始数据

wrap 然后将树状图切割成i 簇,将原始数据拆分为cl 给出的簇成员,并计算每个簇的WSS。这些 WSS 值相加得到该聚类的 WSS。

我们使用sapply 对集群数量 1、2、...、nrow(data) 运行所有这些

res <- sapply(seq.int(1, nrow(data)), wrap, h = cl, x = data)

可以使用

plot(seq_along(res), res, type = "b", pch = 19)

这是一个使用著名的 Edgar Anderson Iris 数据集的示例:

iris2 <- iris[, 1:4]  # drop Species column
cl <- hclust(dist(iris2), method = "ward.D")

## Takes a little while as we evaluate all implied clustering up to 150 groups
res <- sapply(seq.int(1, nrow(iris2)), wrap, h = cl, x = iris2)
plot(seq_along(res), res, type = "b", pch = 19)

这给出了:

我们可以通过仅显示前 1:50 的集群来放大

plot(seq_along(res[1:50]), res[1:50], type = "o", pch = 19)

给了

您可以通过适当的并行替代方案运行sapply() 来加快主要计算步骤,或者只对少于nrow(data) 的集群进行计算,例如

res <- sapply(seq.int(1, 50), wrap, h = cl, x = iris2) ## 1st 50 groups

【讨论】:

谢谢!但是为什么y轴上的值这么大,而我的数据确实很小呢?另外,顺便回答一下我关于wss-plot的另一个问题吗?:***.com/questions/25977798/… y 轴上的值由数据中变量的比例决定。我来看看另一个Q。

以上是关于如何绘制集群的集群内平方和图?的主要内容,如果未能解决你的问题,请参考以下文章

如何绘制一维 K 均值集群

如何使用seaborn专门绘制集群的质心?

如何使用 Matplotlib 从多特征 kmeans 模型中绘制集群和中心?

Python DBSCAN - 如何根据向量的平均值绘制集群?

如何绘制 K-means 并打印集群外的点

在Seaborn中绘制堆积条形图以显示聚类[重复]