可视化 K 均值结果
Posted
技术标签:
【中文标题】可视化 K 均值结果【英文标题】:Visualizing K-Means Results 【发布时间】:2015-06-12 03:06:57 【问题描述】:我在 java 中使用 java.ml 库执行了 K-means 聚类。程序运行并返回给我一些结果。但是,我无法以图形方式或其他方式可视化我的结果。
以下是在我的数据集上运行 Kmeans(具有 3 个集群)10,000 次的结果。 每行显示一个簇,单个向量包含在大括号内,后跟一个向量标签。总共有 20 个向量。因此每个向量有 21 个元素(20 个元素包含每个向量的欧几里德距离结果和 1 个标签)。
[[1.379991791, 1.66939694, 1.652221825, 0.0, 1.608524279, 1.301876739, 1.636492612, 1.133148287, 1.375364013, 1.364991995, 1.439442397, 1.367471641, 1.504259981, 1.455670286, 1.824078594, 0.988337864, 1.254416064, 1.690600364, 1.430614044, 0.0];Kildare, [0.897891569, 1.294965189, 1.272747286, 0.988337864, 0.933043989, 0.64633397, 0.811119483, 0.684254345, 0.791362362, 0.955215387, 0.922512053, 0.995936718, 0.914462079, 0.811321056, 1.489063939, 0.0, 0.833095188, 1.186101833, 0.975048682, 0.0];Meirionydd]
[[0.756018671, 0.0, 1.072767514, 1.66939694, 0.968021022, 0.878913081, 1.053487571, 0.946233907, 0.853126598, 0.774003925, 0.904752776, 0.626444915, 0.733226007, 0.846187129, 0.646029512, 1.294965189, 1.03147877, 1.153793764, 0.662439781, 0.0];Cavan, [0.571149363, 1.072767514, 0.0, 1.652221825, 0.919429121, 0.885531302, 1.069393713, 0.835662283, 0.801244949, 0.759666276, 0.558907577, 0.795476373, 0.779915224, 0.818519916, 1.348409486, 1.272747286, 0.690020061, 0.364744462, 0.697372769, 0.0];Antrim, [0.704713714, 1.053487571, 1.069393713, 1.636492612, 0.696494351, 0.632876921, 0.0, 0.732336012, 0.62846986, 0.806731342, 0.732648598, 0.881896693, 0.586654974, 0.517816625, 1.353456579, 0.811119483, 0.854281544, 0.940822405, 0.748577229, 0.0];Argyll, [1.062301796, 0.646029512, 1.348409486, 1.824078594, 1.319502557, 1.137207988, 1.353456579, 1.176050599, 1.17655296, 1.144257376, 1.199916147, 0.944738211, 1.003826094, 1.09273497, 0.0, 1.489063939, 1.258678525, 1.392239237, 0.974877701, 0.0];Louth, [0.66004187, 1.153793764, 0.364744462, 1.690600364, 0.997476722, 0.930192019, 0.940822405, 0.897316137, 0.877910352, 0.890074516, 0.572702407, 0.912309578, 0.811922993, 0.82701613, 1.392239237, 1.186101833, 0.822317065, 0.0, 0.785857414, 0.0];Sligo]
[[0.0, 0.756018671, 0.571149363, 1.379991791, 0.598705502, 0.440823111, 0.704713714, 0.369950812, 0.344139502, 0.322141594, 0.30307596, 0.373943809, 0.30130205, 0.361096583, 1.062301796, 0.897891569, 0.42575944, 0.66004187, 0.22320165, 0.0];Meath, [0.598705502, 0.968021022, 0.919429121, 1.608524279, 0.0, 0.400200356, 0.696494351, 0.681657903, 0.349425634, 0.581232269, 0.705377386, 0.756150959, 0.594426492, 0.58152343, 1.319502557, 0.933043989, 0.616820158, 0.997476722, 0.695759027, 0.0];Cork, [0.440823111, 0.878913081, 0.885531302, 1.301876739, 0.400200356, 0.0, 0.632876921, 0.379156937, 0.270057141, 0.515322174, 0.587669753, 0.579062531, 0.449839788, 0.372156447, 1.137207988, 0.64633397, 0.401875688, 0.930192019, 0.527217051, 0.0];Anglesey, [0.369950812, 0.946233907, 0.835662283, 1.133148287, 0.681657903, 0.379156937, 0.732336012, 0.0, 0.373812011, 0.426943065, 0.514824591, 0.472071956, 0.42128179, 0.374679842, 1.176050599, 0.684254345, 0.369857776, 0.897316137, 0.432282187, 0.0];Cumbria, [0.344139502, 0.853126598, 0.801244949, 1.375364013, 0.349425634, 0.270057141, 0.62846986, 0.373812011, 0.0, 0.308911819, 0.50471573, 0.51995704, 0.372325424, 0.393606022, 1.17655296, 0.791362362, 0.439352024, 0.877910352, 0.460110427, 0.0];Donegal, [0.322141594, 0.774003925, 0.759666276, 1.364991995, 0.581232269, 0.515322174, 0.806731342, 0.426943065, 0.308911819, 0.0, 0.481033789, 0.356496842, 0.390556187, 0.508317882, 1.144257376, 0.955215387, 0.572062535, 0.890074516, 0.364283224, 0.0];Down, [0.30307596, 0.904752776, 0.558907577, 1.439442397, 0.705377386, 0.587669753, 0.732648598, 0.514824591, 0.50471573, 0.481033789, 0.0, 0.51402604, 0.437007227, 0.479873162, 1.199916147, 0.922512053, 0.58160753, 0.572702407, 0.391509468, 0.0];Dublin, [0.373943809, 0.626444915, 0.795476373, 1.367471641, 0.756150959, 0.579062531, 0.881896693, 0.472071956, 0.51995704, 0.356496842, 0.51402604, 0.0, 0.358149113, 0.468362451, 0.944738211, 0.995936718, 0.648482615, 0.912309578, 0.230425463, 0.0];Fermanagh, [0.30130205, 0.733226007, 0.779915224, 1.504259981, 0.594426492, 0.449839788, 0.586654974, 0.42128179, 0.372325424, 0.390556187, 0.437007227, 0.358149113, 0.0, 0.202949102, 1.003826094, 0.914462079, 0.587557668, 0.811922993, 0.247069837, 0.0];Kilkenny, [0.361096583, 0.846187129, 0.818519916, 1.455670286, 0.58152343, 0.372156447, 0.517816625, 0.374679842, 0.393606022, 0.508317882, 0.479873162, 0.468362451, 0.202949102, 0.0, 1.09273497, 0.811321056, 0.512935905, 0.82701613, 0.345322661, 0.0];Liverpool, [0.42575944, 1.03147877, 0.690020061, 1.254416064, 0.616820158, 0.401875688, 0.854281544, 0.369857776, 0.439352024, 0.572062535, 0.58160753, 0.648482615, 0.587557668, 0.512935905, 1.258678525, 0.833095188, 0.0, 0.822317065, 0.569894325, 0.0];Orcades, [0.22320165, 0.662439781, 0.697372769, 1.430614044, 0.695759027, 0.527217051, 0.748577229, 0.432282187, 0.460110427, 0.364283224, 0.391509468, 0.230425463, 0.247069837, 0.345322661, 0.974877701, 0.975048682, 0.569894325, 0.785857414, 0.0, 0.0];Tyrone, [0.478273792, 0.998584641, 0.897044755, 1.115020212, 0.625093719, 0.349427287, 0.77612927, 0.268249218, 0.372374521, 0.524273498, 0.653109111, 0.648992694, 0.578933739, 0.51634108, 1.238311432, 0.685177096, 0.297897403, 0.981250327, 0.596559145, 0.0];Wicklow]
有人可以建议一种以某种有意义的方式将这些结果可视化的方法吗?
【问题讨论】:
是否可以将任何一个集群中的数据项绘制为 2D 平面上的一个点? 尝试 ELKI(包括可视化)而不是 java-ml(多年来已死)。另请注意,k-means 不使用成对距离! 【参考方案1】:在可视化高维数据时,我建议使用一种方法将维数减少到 2-3 并绘制它们。这里有两个建议:
Principal component analysis,可以找到示例 java 实现 here t-Distributed Stochastic Neighbor Embedding,你可以在 Java 中找到实现,当然还有其他降维的方法。尝试谷歌Dimensionality reduction
。
【讨论】:
以上是关于可视化 K 均值结果的主要内容,如果未能解决你的问题,请参考以下文章
R语言进行数据聚合统计(Aggregating transforms)实战:使用R原生方法data.tabledplyr等方案计算分组均值并添加到可视化结果中