被数值变量淹没的KPrototypes（Python）

2024-10-01 07:22:05 发布

男 | 程序猿一只，喜欢编程写python代码。

更新：ER_变量被淹没的原因可能是分类变量中的唯一值具有相同的频率。在此数据集中，每一行代表一个子组。例如，在Sex变量中，女性和男性的行数相同。ER_变量表示实际频率。K-Prototype按模式对分类数据进行分组。但是，所有分类值的行数相同，因此无法计算模式。这可能就是为什么ER_变量被淹没的原因

我计划创建一个新的数据框，每行有x个副本。x是对应的ER_值。然后，删除ER_访问列并使用K模式创建模型。行吗

我正在使用K-Prototype对分类数据和数字数据进行组合。我只有一个名为ER_Visit的数字列，我已经将其规范化为（-1,1）。然而，当我试图用ER_访问和任何分类变量可视化K-Prototype结果时，仍然会得到水平聚类。我该怎么修

这是密码

# Normalize ER_Visits
from sklearn.preprocessing import MinMaxScaler
df_2019['ER_Scaled'] = MinMaxScaler().fit_transform(np.array(df_2019['ER_Visits']).reshape(-1,1))

# Get the position of categorical columns
catColumnsPos = [df_2019.columns.get_loc(col) for col in list(df_2019.select_dtypes('object').columns)]
print('Categorical columns           : {}'.format(list(df_2019.select_dtypes('object').columns)))
print('Categorical columns position  : {}'.format(catColumnsPos))

dfMatrix = df_2019.to_numpy()
dfMatrix

# Fit the cluster
kprototype = KPrototypes(n_jobs = -1, n_clusters = 5, init = 'Huang', random_state = 0)
kprototype.fit_predict(dfMatrix, categorical = catColumnsPos)

# Visualization
ax = sns.stripplot(data=df_nonzero, x='Geography', y='ER_Scaled', hue='cluster_id')
ax.set_yscale('log')

这是我得到的结果。我也试着用“原因”和“性”作为x轴，但我得到了类似的结果

在K-prototype之前，我编码了除原因之外的所有分类变量，并用K-means建模。K-means的可视化显示了一些模式

Tags： columns 数据 df 可视化模式分类原因数字

0条回答

目前没有回答

被数值变量淹没的KPrototypes（Python）

相关问题更多 >

编程相关推荐

热门问题

热门文章

被数值变量淹没的KPrototypes（Python）

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >