'发现更好的聚类后的分类 - Sklearn'

2024-09-27 21:28:23 发布

您现在位置:Python中文网/ 问答频道 /正文

我使用kmeans对数据进行分类

我用肘部法和轮廓法找到了更好的k聚类来验证决策

那么现在我如何对数据进行分类并绘制分布图呢

你能帮我做这个吗

这是我的密码

import pandas as pd
import seaborn as sns
from sklearn.datasets import load_iris
from sklearn.cluster import KMeans
from sklearn import preprocessing
import matplotlib.pyplot as plt
from sklearn.metrics import silhouette_score
%matplotlib inline

df_diabetes = pd.read_csv('diabetes.csv')


#Deletando a coluna "Classe"
df_noclass = df_diabetes.drop('Classe', axis=1)
df_noclass.head()


nomes = df_diabetes_noclass.columns
valores = df_diabetes_noclass.values
escala_min_max = preprocessing.MinMaxScaler()
valores_normalizados = escala_min_max.fit_transform(valores)
df_diabetes_normalizado = pd.DataFrame(valores_normalizados)
df_diabetes_normalizado.columns = nomes
df_diabetes_normalizado.head(5)


sse = {}
for k in range(1, 10):
    kmeans = KMeans(n_clusters=k, max_iter=1000).fit(data)
    df_diabetes_normalizado["clusters"] = kmeans.labels_
    sse[k] = kmeans.inertia_ 
plt.figure(figsize=(14,9))
plt.plot(list(sse.keys()), list(sse.values()))
plt.xlabel("Numero de Clusters")
plt.ylabel("SSE")
plt.show()


X = df_diabetes_normalizado
y = df_diabetes_normalizado

for n_cluster in range(2, 11):
    kmeans = KMeans(n_clusters=n_cluster).fit(X)
    label = kmeans.labels_
    sil_coeff = silhouette_score(X, label, metric='euclidean')
    print("Para n_clusters={}, O Coeficiente de silueta é {}".format(n_cluster, sil_coeff))

我现在需要分类我的数据,并创建一个像下面的图

enter image description here


Tags: 数据fromimportdf分类pltsklearnsse

热门问题