我试图从运行在Google Colab笔记本上的sklearn的KMeans中获得可复制的结果。 Kmeans算法适用于主成分分析(PCA)生成的数组。 每次我重新启动笔记本的运行时,拟合、预测并生成K-means算法的轮廓分数,轮廓分数就会改变
下面是我使用Kmeans进行拟合和预测并生成剪影评分的代码:
for n_clusters in range(3,9):
kmeans = KMeans(init= 'k-means++', n_clusters = n_clusters, n_init= 25, random_state = 0)
kmeans.fit(pca_mat_products)
clusters = kmeans.predict(pca_mat_products)
silhouette_avg = silhouette_score(mp_matrix, clusters, random_state= 0)
print("For n_clusters =", n_clusters, "The average silhouette_score is :", silhouette_avg)
以下是生成的剪影_分数示例:
For n_clusters = 3 The average silhouette_score is : 0.08689747798228342
For n_clusters = 4 The average silhouette_score is : 0.11513524544540599
For n_clusters = 5 The average silhouette_score is : 0.13225896257848024
For n_clusters = 6 The average silhouette_score is : 0.13390795741576195
For n_clusters = 7 The average silhouette_score is : 0.11262045164741093
For n_clusters = 8 The average silhouette_score is : 0.12179451798486395
当我重新启动笔记本的运行时,同时保持笔记本上的一切不变(包括random_state=0),并从头开始运行单元格时,每次重新启动笔记本时,都会出现新的轮廓分数
以下是相同代码在不同运行中生成的轮廓_分数:
For n_clusters = 3 The average silhouette_score is : 0.09181951382862036
For n_clusters = 4 The average silhouette_score is : 0.11539863985647045
For n_clusters = 5 The average silhouette_score is : 0.13363229313208771
For n_clusters = 6 The average silhouette_score is : 0.13428788881085452
For n_clusters = 7 The average silhouette_score is : 0.13187306014661757
For n_clusters = 8 The average silhouette_score is : 0.13252806332855294
在以后的运行时中,轮廓_分数不断变化
mp_matrix是一个单热编码数组,如下所示:
array([[0, 0, 0, ..., 0, 1, 0],
[0, 0, 0, ..., 0, 1, 0],
[0, 0, 0, ..., 0, 1, 0],
...,
[0, 0, 0, ..., 0, 0, 0],
[1, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]])
在Google Colab中重新启动运行时后,剪影_分数发生变化是否正常? 是否有任何方法可以获得可复制的轮廓_分数
我在网上搜索过这里和其他地方,没有发现有人在讨论这个问题
谢谢大家!
谢谢你的帮助
根据您的代码,您似乎正在根据PCA的结果进行聚类:
如果您需要PCA的可再现结果,也可以在那里设置random_状态
这里是doc:https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html
相关问题 更多 >
编程相关推荐