我正在进行一个机器学习项目,将家庭分为四个贫困类别中的一个。
我有一个数据集,其中每一行代表一个人的观察结果。
这些观察结果是个人所属家庭的特征。
某些行/个人属于同一家庭(多行可以是同一家庭的成员)。
在每个家庭中,一人是户主;其他人只是家庭成员。
项目要求规定评分时仅使用户主。
清理完数据集后,我的下一个计划是
但是,我不确定在为家庭成员聚合数据之前执行目标编码是否是一个好主意
这就是我执行目标编码的方式
这些是“一个热编码”列的含义
abastaguadentro is represented as 1 if water provision inside the dwelling
abastaguafuera is represented as 1 if water provision outside the dwelling
abastaguano is represented as 1 if there is no water provision
public is represented as 1 electricity from CNFL, ICE, ESPH/JASEC"
planpri is represented as 1 electricity from private plant
noelec is represented as 1 no electricity in the dwelling
coopele is represented as 1 electricity from cooperative
sanitario1 is represented as 1 no toilet in the dwelling
sanitario2 is represented as 1 toilet connected to sewer or cesspool
sanitario3 is represented as 1 toilet connected to septic tank
sanitario5 is represented as 1 toilet connected to black hole or letrine
sanitario6 is represented as 1 toilet connected to other system
我的问题是,“在将数据与户主合并之前,可以在目标编码后为家庭成员聚合数据吗?”
有关我的项目的更多信息
目前没有回答
相关问题 更多 >
编程相关推荐