python中基于合并顺序的分层聚类标签

1条回答

网友

1楼 · 发布于 2024-10-02 00:29:40

由scipy.cluster.hierarchy函数生成的链接矩阵有一个额外字段，用于新形成的簇中的观察数：

scipy.cluster.hierarchy.linkage: A (n−1) by 4 matrix Z is returned. At the i-th iteration, clusters with indices Z[i, 0] and Z[i, 1] are combined to form cluster n+i. A cluster with an index less than n corresponds to one of the n original observations. The distance between clusters Z[i, 0] and Z[i, 1] is given by Z[i, 2]. The fourth value Z[i, 3] represents the number of original observations in the newly formed cluster.

我不确定我是否完全遵循了您的示例^[1]，但您可以使用簇大小来定义切割深度，从而生成簇的平面列表，从而沿着这些线获得一些东西。例如，逻辑可以是“在集群大小仍然为2或更小的最后一次合并时停止”（给出第一个包含3个集群的列表）或“在集群大小为3或更大的第一次合并时停止”（给出第二个包含2个集群的列表）

下面是一个数据集示例，该数据集提供了与绘图中显示的数据集相似的层次聚类，显示了与两个示例匹配的结果：

import numpy as np
from scipy.cluster.hierarchy import single, fcluster
from scipy.spatial.distance import pdist

X = [
    (0, 0, .45), # P1
    (0, .36, 0), # P2
    (0, 0, 0), # P3
    (.3, 0, 0), # P4
    (.31, .36, 0), # P5
]

Z = single(pdist(X))

i1 = np.argwhere(Z[:,3] <= 2)[-1,0]        # => i1 = 1
d1 = Z[i1, 2]                              # => d = 0.31
c1 = fcluster(Z, d1, criterion='distance') # => c1 = [3, 2, 1, 1, 2]
# i.e., three clusters: {P3, P4}, {P2, P5} and {P1}

i2 = np.argwhere(Z[:,3] >= 3)[0,0]         # => i2 = 2
d2 = Z[i2, 2]                              # => d2 = 0.36
c2 = fcluster(Z, d2, criterion='distance') # => c2 = [2, 1, 1, 1, 1]
# i.e., two clusters: {P2, P3, P4, P5} and {P1}

^{^[1]当P3和P4合并时，“至少第一次合并”不会立即发生，只剩下4个集群吗？没有理由期望“第二次合并”总是合并两对：它也可以将单个观测值与一对观测值合并。这就是为什么我建议使用集群大小而不是“N个mergings”。}

相关问题更多 >

编程相关推荐

热门问题

热门文章

python中基于合并顺序的分层聚类标签

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >