如何在层次聚类中跟踪特定项目的存在

Posted 2023-02-19

技术标签:

【中文标题】如何在层次聚类中跟踪特定项目的存在【英文标题】：how can I track a specific item presence in hierarchy clustering 【发布时间】：2015-01-07 01:05:27 【问题描述】：

我有一个与层次聚类相关的问题。我有一个相对复杂的数据集，包含 2000 个项目/样本。我使用 scipy 对项目进行聚类，并为聚类提供不同的截止值，例如从 0.1 -0.9

from scipy.cluster import hierarchy as hac
Z=hac.linkage(distance, single,'euclidean')
results=hac.fcluster(Z, cutoff,'distance')

我如何检查/跟踪某个项目，例如 x 组中的截止值为 0.1，以及 y 组中的截止值为 0.2 时。等等

我考虑过显示树状图，但是要从树状图中跟踪 2000 个样本中的 1 个项目会不会太麻烦？

【问题讨论】：

【参考方案1】：

尝试使用set(list(..)) 构建一组集群 ID 以删除重复项，然后遍历元素并根据它们所属的集群过滤数据。试一试，因为您没有提供数据样本来测试它。

您的代码如下所示：

clusterIDs = set(list(results))
D=  # Dictinary where you store ClusterID: [list of points that belong to that cluster]
for i, clusterID in enumerate(clusterIDs):
  clusterItems = data[np.where(results == clusterID)]
  D[clusterID]=clusterItems

【讨论】：

以上是关于如何在层次聚类中跟踪特定项目的存在的主要内容，如果未能解决你的问题，请参考以下文章