k-均值聚类算法预测值

网友

1楼 · 编辑于 2024-09-21 04:46:57

如果您正在考虑根据最近的簇内的平均值分配一个值，那么您正在讨论某种形式的“软解码器”，它不仅估计坐标的正确值，而且估计您对估计值的信心级别。另一种选择是“硬解码器”，其中只有0和1的值是合法的（出现在训练数据集中），新的坐标将得到最近的簇内值的中值。我的猜测是，您应该始终只为每个坐标分配已知的有效类值（0或1），而平均类值不是有效的方法。

网友

2楼 · 编辑于 2024-09-21 04:46:57

我知道我可能会迟到，但这是我解决你问题的一般办法：

def predict(data, centroids):
    centroids, data = np.array(centroids), np.array(data)
    distances = []
    for unit in data:
        for center in centroids:
            distances.append(np.sum((unit - center) ** 2))                
    distances = np.reshape(distances, data.shape)
    closest_centroid = [np.argmin(dist) for dist in distances]
    print(closest_centroid)

网友

3楼 · 编辑于 2024-09-21 04:46:57

要将新的数据点指定给k-means创建的一组簇中的一个，只需找到最接近该点的质心。

换句话说，与将原始数据集中的每个点迭代分配到k个簇中的一个簇所用的步骤相同。这里唯一的区别是，用于此计算的质心是final集合，即上一次迭代的质心值。

在python（w/NumPy）中有一个实现：

>>> import numpy as NP
>>> # just made up values--based on your spec (2D data + 2 clusters)
>>> centroids
      array([[54, 85],
             [99, 78]])

>>> # randomly generate a new data point within the problem domain:
>>> new_data = NP.array([67, 78])

>>> # to assign a new data point to a cluster ID,
>>> # find its closest centroid:
>>> diff = centroids - new_data[0,:]  # NumPy broadcasting
>>> diff
      array([[-13,   7],
             [ 32,   0]])

>>> dist = NP.sqrt(NP.sum(diff**2, axis=-1))  # Euclidean distance
>>> dist
      array([ 14.76,  32.  ])

>>> closest_centroid = centroids[NP.argmin(dist),]
>>> closest_centroid
       array([54, 85])

相关问题更多 >

编程相关推荐

热门问题

热门文章

k-均值聚类算法预测值

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >