kmeans返回nan值?

2024-05-17 06:33:56 发布

您现在位置:Python中文网/ 问答频道 /正文

我最近遇到了一个k-means教程,它看起来和我记忆中的算法有点不同,但毕竟它是k-means,它应该还是一样的。因此,我尝试了一些数据,下面是代码的外观:

# Assignment Stage:

def assignment(data, centroids):
    for i in centroids.keys():
        #sqrt((x1-x2)^2+(y1-y2)^2 + etc)
        data['distance_from_{}'.format(i)]= (
        np.sqrt((data['soloRatio']-centroids[i][0])**2
        +(data['secStatus']-centroids[i][1])**2
            +(data['shipsDestroyed']-centroids[i][2])**2
            +(data['combatShipsLost']-centroids[i][3])**2
            +(data['miningShipsLost']-centroids[i][4])**2
            +(data['exploShipsLost']-centroids[i][5])**2
            +(data['otherShipsLost']-centroids[i][6])**2
        ))


    print(data['distance_from_{}'.format(i)])
    centroid_distance_cols = ['distance_from_{}'.format(i) for i in centroids.keys()]

    data['closest'] = data.loc[:, centroid_distance_cols].idxmin(axis=1)
    data['closest'] = data['closest'].astype(str).str.replace('\D+', '')
    return data


data = assignment(data, centroids)

以及:

#Update stage:


import copy

old_centroids = copy.deepcopy(centroids)

def update(k):
    for i in centroids.keys():
        centroids[i][0]=np.mean(data[data['closest']==i]['soloRatio'])
        centroids[i][1]=np.mean(data[data['closest']==i]['secStatus'])
        centroids[i][2]=np.mean(data[data['closest']==i]['shipsDestroyed'])
        centroids[i][3]=np.mean(data[data['closest']==i]['combatShipsLost'])
        centroids[i][4]=np.mean(data[data['closest']==i]['miningShipsLost'])
        centroids[i][5]=np.mean(data[data['closest']==i]['exploShipsLost'])
        centroids[i][6]=np.mean(data[data['closest']==i]['otherShipsLost'])
    return k


#TODO: add graphical representation?

while True:
    closest_centroids = data['closest'].copy(deep=True)
    centroids = update(centroids)
    data = assignment(data,centroids)
    if(closest_centroids.equals(data['closest'])):
        break

当我运行初始赋值阶段时,它会返回距离,但是当我运行更新阶段时,所有距离值都变为NaN,我只是不知道为什么会发生这种情况,或者在什么时候发生这种情况。。。也许我犯了我看不见的错误

以下是我处理的数据摘录:

 Unnamed: 0  characterID  combatShipsLost  exploShipsLost  miningShipsLost  \
0           0   90000654.0              8.0             4.0              5.0   
1           1   90001581.0             97.0             5.0              1.0   
2           2   90001595.0             61.0             0.0              0.0   
3           3   90002023.0             22.0             1.0              0.0   
4           4   90002030.0             74.0             0.0              1.0   

   otherShipsLost  secStatus  shipsDestroyed  soloRatio  
0             0.0   5.003100             1.0       10.0  
1             0.0   2.817807          6251.0        6.0  
2             0.0  -2.015310           752.0        0.0  
3             4.0   5.002769            43.0        5.0  
4             1.0   3.090204           301.0        7.0 

Tags: infromformatfordatanpkeysmean