scipy.cluster.hierarchy的教程

a = np.array([[0, 0 ], [1, 0 ], [0, 1 ], [1, 1 ], [0.5, 0 ], [0, 0.5], [0.5, 0.5], [2, 2 ], [2, 3 ], [3, 2 ], [3, 3 ]])

1条回答

网友

1楼 · 发布于 2024-05-11 13:27:46

层次聚集聚类（HAC）有三个步骤：

量化数据（metric参数）
群集数据（method参数）
选择群集数

做

z = linkage(a)

将完成前两个步骤。因为您没有指定任何参数，所以它使用标准值

metric = 'euclidean'
method = 'single'

因此z = linkage(a)将给你一个a的单链层次聚集簇。这种聚类是一种解决方案的层次结构。从这个层次结构中，您可以获得有关数据结构的一些信息。你现在可以做的是：

检查哪个metric是合适的，例如cityblock或chebychev将不同地量化您的数据（cityblock、euclidean和chebychev对应于L1、L2和L_inf规范）
检查methdos的不同属性/行为（例如single、complete和average）
检查如何确定集群的数量，例如通过reading the wiki about it
计算找到的解（clustering）的索引，比如silhouette coefficient（通过这个系数，你可以得到一个反馈，关于一个点/观测值与集群分配给它的集群的匹配程度）。不同的索引使用不同的标准来限定聚类。

这是一个开始

import numpy as np
import scipy.cluster.hierarchy as hac
import matplotlib.pyplot as plt


a = np.array([[0.1,   2.5],
              [1.5,   .4 ],
              [0.3,   1  ],
              [1  ,   .8 ],
              [0.5,   0  ],
              [0  ,   0.5],
              [0.5,   0.5],
              [2.7,   2  ],
              [2.2,   3.1],
              [3  ,   2  ],
              [3.2,   1.3]])

fig, axes23 = plt.subplots(2, 3)

for method, axes in zip(['single', 'complete'], axes23):
    z = hac.linkage(a, method=method)

    # Plotting
    axes[0].plot(range(1, len(z)+1), z[::-1, 2])
    knee = np.diff(z[::-1, 2], 2)
    axes[0].plot(range(2, len(z)), knee)

    num_clust1 = knee.argmax() + 2
    knee[knee.argmax()] = 0
    num_clust2 = knee.argmax() + 2

    axes[0].text(num_clust1, z[::-1, 2][num_clust1-1], 'possible\n<- knee point')

    part1 = hac.fcluster(z, num_clust1, 'maxclust')
    part2 = hac.fcluster(z, num_clust2, 'maxclust')

    clr = ['#2200CC' ,'#D9007E' ,'#FF6600' ,'#FFCC00' ,'#ACE600' ,'#0099CC' ,
    '#8900CC' ,'#FF0000' ,'#FF9900' ,'#FFFF00' ,'#00CC01' ,'#0055CC']

    for part, ax in zip([part1, part2], axes[1:]):
        for cluster in set(part):
            ax.scatter(a[part == cluster, 0], a[part == cluster, 1], 
                       color=clr[cluster])

    m = '\n(method: {})'.format(method)
    plt.setp(axes[0], title='Screeplot{}'.format(m), xlabel='partition',
             ylabel='{}\ncluster distance'.format(m))
    plt.setp(axes[1], title='{} Clusters'.format(num_clust1))
    plt.setp(axes[2], title='{} Clusters'.format(num_clust2))

plt.tight_layout()
plt.show()

给予 enter image description here

相关问题更多 >

编程相关推荐

热门问题

热门文章

scipy.cluster.hierarchy的教程

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >