我有下面的代码
def order(frame, feature):
# Enter the dataset and the list of features that will need to be ordered
# !!!! the list of feture must be qualitative data
# Return:
# 1. the dataset with the feature ordered against the target label
# 2. the list of qualitative feature name ordered
frame_cop = frame.copy()
qual_encoded = []
for f in feature:
ordering = pd.DataFrame()
ordering['val'] = frame_cop[f].unique()
ordering.index = ordering.val
ordering['spmean'] = frame_cop[[f, 'SalePrice']].groupby(f).mean()['SalePrice']
ordering = ordering.sort_values('spmean')
ordering['ordering'] = range(1, ordering.shape[0]+1)
ordering = ordering['ordering'].to_dict()
for cat, o in ordering.items():
frame_cop.loc[frame_cop[f] == cat, f] = o
qual_encoded.append(f)
return frame_cop, qual_encoded
当我想可视化热图时,如下所示:
plt.figure()
train_csv_up, qual_order = order(train_csv, qualitative_card)
sns.heatmap(train_csv_up[qual_order].corr())
我有错误消息:ValueError:zero size数组到没有标识的缩减操作fmin
或者,如果我添加以下['SalePrice']
plt.figure()
train_csv_up, qual_order = order(train_csv, qualitative_card)
sns.heatmap(train_csv_up[qual_order+['SalePrice'].corr())
热图仅显示售价
但是,如果我更改下面的这些行
for cat, o in ordering.items():
frame_cop.loc[frame_cop[f] == cat, f] = o
qual_encoded.append(f)
到
for cat, o in ordering.items():
frame_cop.loc[frame_cop[f] == cat, f+'_q'] = o
qual_encoded.append(f+'_q')
然后它就可以工作了(有和没有SalePrice),但我想避免有一个新的特性名称列表,并保留f而不是f+“”q'
但我不明白为什么它在一种情况下有效,而在另一种情况下无效
期待阅读您的评论
问候
目前没有回答
相关问题 更多 >
编程相关推荐