我使用kmeans对数据进行分类
我用肘部法和轮廓法找到了更好的k聚类来验证决策
那么现在我如何对数据进行分类并绘制分布图呢
你能帮我做这个吗
这是我的密码
import pandas as pd
import seaborn as sns
from sklearn.datasets import load_iris
from sklearn.cluster import KMeans
from sklearn import preprocessing
import matplotlib.pyplot as plt
from sklearn.metrics import silhouette_score
%matplotlib inline
df_diabetes = pd.read_csv('diabetes.csv')
#Deletando a coluna "Classe"
df_noclass = df_diabetes.drop('Classe', axis=1)
df_noclass.head()
nomes = df_diabetes_noclass.columns
valores = df_diabetes_noclass.values
escala_min_max = preprocessing.MinMaxScaler()
valores_normalizados = escala_min_max.fit_transform(valores)
df_diabetes_normalizado = pd.DataFrame(valores_normalizados)
df_diabetes_normalizado.columns = nomes
df_diabetes_normalizado.head(5)
sse = {}
for k in range(1, 10):
kmeans = KMeans(n_clusters=k, max_iter=1000).fit(data)
df_diabetes_normalizado["clusters"] = kmeans.labels_
sse[k] = kmeans.inertia_
plt.figure(figsize=(14,9))
plt.plot(list(sse.keys()), list(sse.values()))
plt.xlabel("Numero de Clusters")
plt.ylabel("SSE")
plt.show()
X = df_diabetes_normalizado
y = df_diabetes_normalizado
for n_cluster in range(2, 11):
kmeans = KMeans(n_clusters=n_cluster).fit(X)
label = kmeans.labels_
sil_coeff = silhouette_score(X, label, metric='euclidean')
print("Para n_clusters={}, O Coeficiente de silueta é {}".format(n_cluster, sil_coeff))
我现在需要分类我的数据,并创建一个像下面的图
如果要预测新数据属于哪个群集,则需要使用predict方法:
以下是predict方法的文档链接:
https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html#sklearn.cluster.KMeans.predict
相关问题 更多 >
编程相关推荐