擅长:python、mysql、java
<p>要将新的数据点指定给k-means创建的一组簇中的一个,只需找到最接近该点的质心。</p>
<p>换句话说,与将原始数据集中的每个点迭代分配到k个簇中的一个簇所用的步骤相同。这里唯一的区别是,用于此计算的质心是<em>final</em>集合,即<em>上一次</em>迭代的质心值。</p>
<p>在<em>python</em>(w/NumPy)中有一个实现:</p>
<pre><code>>>> import numpy as NP
>>> # just made up values--based on your spec (2D data + 2 clusters)
>>> centroids
array([[54, 85],
[99, 78]])
>>> # randomly generate a new data point within the problem domain:
>>> new_data = NP.array([67, 78])
>>> # to assign a new data point to a cluster ID,
>>> # find its closest centroid:
>>> diff = centroids - new_data[0,:] # NumPy broadcasting
>>> diff
array([[-13, 7],
[ 32, 0]])
>>> dist = NP.sqrt(NP.sum(diff**2, axis=-1)) # Euclidean distance
>>> dist
array([ 14.76, 32. ])
>>> closest_centroid = centroids[NP.argmin(dist),]
>>> closest_centroid
array([54, 85])
</code></pre>