我正在使用sklearn_extra.cluster中的KMedoids。我将它与一个预先计算的距离矩阵(metric='precomputed')一起使用,并且它曾经起作用。然而,我们在计算距离矩阵的方法中发现了一个缺陷,因此必须自己实现它。从那时起,KMedoids算法就不再有效了。这是stacktrace:
C:\Users\...\AppData\Local\Programs\Python\Python38-32\lib\site-packages\sklearn_extra\cluster\_k_medoids.py:231: UserWarning: Cluster 1 is empty! self.labels_[self.medoid_indices_[1]] may not be labeled with its corresponding cluster (1).
warnings.warn(enter code here
C:\Users\...\AppData\Local\Programs\Python\Python38-32\lib\site-packages\sklearn_extra\cluster\_k_medoids.py:231: UserWarning: Cluster 2 is empty! self.labels_[self.medoid_indices_[2]] may not be labeled with its corresponding cluster (2).
warnings.warn(
C:\Users\...\AppData\Local\Programs\Python\Python38-32\lib\site-packages\sklearn_extra\cluster\_k_medoids.py:231: UserWarning: Cluster 3 is empty! self.labels_[self.medoid_indices_[3]] may not be labeled with its corresponding cluster (3).
warnings.warn(
C:\Users\...\AppData\Local\Programs\Python\Python38-32\lib\site-packages\sklearn_extra\cluster\_k_medoids.py:231: UserWarning: Cluster 4 is empty! self.labels_[self.medoid_indices_[4]] may not be labeled with its corresponding cluster (4).
warnings.warn(
C:\Users\...\AppData\Local\Programs\Python\Python38-32\lib\site-packages\sklearn_extra\cluster\_k_medoids.py:231: UserWarning: Cluster 5 is empty! self.labels_[self.medoid_indices_[5]] may not be labeled with its corresponding cluster (5).
warnings.warn(
C:\Users\...\AppData\Local\Programs\Python\Python38-32\lib\site-packages\sklearn_extra\cluster\_k_medoids.py:231: UserWarning: Cluster 6 is empty! self.labels_[self.medoid_indices_[6]] may not be labeled with its corresponding cluster (6).
warnings.warn(
C:\Users\...\AppData\Local\Programs\Python\Python38-32\lib\site-packages\sklearn_extra\cluster\_k_medoids.py:231: UserWarning: Cluster 7 is empty! self.labels_[self.medoid_indices_[7]] may not be labeled with its corresponding cluster (7).
warnings.warn(
我已经检查了距离矩阵,它是一个二维n数组,有n_数据x n_数据,其中对角线上的值为零,所以这不应该是问题。所有值都在0和1之间。我们曾经使用this algorithm for the Gower distance,但由于某种原因,当我们只有分类数据时,这就行不通了。我们所有的值都是布尔值。高尔距离返回以下信息:
File "C:\Users\...\AppData\Local\Programs\Python\Python38-32\lib\site-packages\gower\gower_dist.py", line 62, in gower_matrix
Z_num = np.divide(Z_num ,num_max,out=np.zeros_like(Z_num), where=num_max!=0)
TypeError: ufunc 'true_divide' output (typecode 'd') could not be coerced to provided output parameter (typecode '?') according to the casting rule ''same_kind''
我还尝试了pykmedoids,这确实有效。但是,您需要自己使用pyclustering定义初始medoid,而我发现的方法不适用于分类数据。(见下文)
initial_medoids = kmeans_plusplus_initializer(data, n_clus, kmeans_plusplus_initializer.FARTHEST_CENTER_CANDIDATE).initialize(return_index=True)
堆栈跟踪:
File "path_to_file", line 19, in <module>
initial_medoids = kmeans_plusplus_initializer(data, n_clus, kmeans_plusplus_initializer.FARTHEST_CENTER_CANDIDATE).initialize(return_index=True)
File "path\Python\Python38-32\lib\site-packages\pyclustering\cluster\center_initializer.py", line 357, in initialize
index_point = self.__get_next_center(centers)
File "path\Python\Python38-32\lib\site-packages\pyclustering\cluster\center_initializer.py", line 256, in __get_next_center
distances = self.__calculate_shortest_distances(self.__data, centers)
File "path\Python\Python38-32\lib\site-packages\pyclustering\cluster\center_initializer.py", line 236, in __calculate_shortest_distances
dataset_differences[index_center] = numpy.sum(numpy.square(data - center), axis=1).T
TypeError: numpy boolean subtract, the `-` operator, is not supported, use the bitwise_xor, the `^` operator, or the logical_xor function instead.
我的问题可以通过三种方式解决,因此我希望有人能帮助我:
我已经发布了下面代码的简单版本
import pandas as pd
import gower_distance as dist
from sklearn_extra.cluster import KMedoids
data = pd.read_csv(path_to_data)
dist = calcDist(data) # Returns NxN array where N is the amount of data points
# I'm using 8 clusters, which is the default, so I haven't defined it
kmedoids = KMedoids(metric='precomputed').fit(dist)
labels = kmedoids.predict(dist)
要从经过训练的模型中获取群集标签(即列车标签)
要使用经过训练的k-medoids模型对任何预测数据使用
kmedoids.predict
,您需要计算从N
预测数据到K
medoids的N x K
距离矩阵,正确索引您可以从the source code查看更多信息
我也收到了这个警告(不过使用欧几里德距离)。使用群集核心的另一次初始化为我修复了它:
相关问题 更多 >
编程相关推荐