我如何创建一个新的df,其中只包含转换为假人的变量?

2024-10-02 06:32:51 发布

您现在位置:Python中文网/ 问答频道 /正文

这是我的数据的csv文件的前5行:

df=school;sex;age;address;famsize;Pstatus;Medu;Fedu;Mjob;Fjob;reason;guardian;traveltime;studytime;failures;schoolsup;famsup;paid;activities;nursery;higher;internet;romantic;famrel;freetime;goout;Dalc;Walc;health;absences;G1;G2;G3 "GP";"F";18;"U";"GT3";"A";4;4;"at_home";"teacher";"course";"mother";2;2;0;"yes";"no";"no";"no";"yes";"yes";"no";"no";4;3;4;1;1;3;6;"5";"6";6 "GP";"F";17;"U";"GT3";"T";1;1;"at_home";"other";"course";"father";1;2;0;"no";"yes";"no";"no";"no";"yes";"yes";"no";5;3;3;1;1;3;4;"5";"5";6 "GP";"F";15;"U";"LE3";"T";1;1;"at_home";"other";"other";"mother";1;2;3;"yes";"no";"yes";"no";"yes";"yes";"yes";"no";4;3;2;2;3;3;10;"7";"8";10 "GP";"F";15;"U";"GT3";"T";4;2;"health";"services";"home";"mother";1;3;0;"no";"yes";"yes";"yes";"yes";"yes";"yes";"yes";3;2;2;1;1;5;2;"15";"14";15

df.columns = df.columns.str.title()
import numpy as np
from sklearn.model_selection import train_test_split
import statsmodels.api as sm

dummies=pd.get_dummies(df[['School','Sex', 'Address','Famsize','Pstatus', 'Mjob','Fjob','Reason','Guardian','Schoolsup', 'Famsup','Paid','Activities','Nursery','Higher','Internet','Romantic']], drop_first=True)

display(dummies.head())

df1=df.join(dummies)

df1=df1.drop(['School','Sex', 'Address', 'Famsize','Pstatus', 'Mjob', 'Fjob', 'Reason', 'Guardian', 'Schoolsup', 'Famsup', 'Paid', 'Activities', 'Nursery', 'Higher', 'Internet', 'Romantic'])

KeyError:“['School''Sex''Address''Famsize''Pstatus''Mjob''Fjob''Reason''Guardian'\n'Schoolsup''Famsup''Paid''Activities''托儿所''Higher''Internet'\n'romatic']未在axis中找到”

将原始变量虚拟变量连接在一起后,我想删除没有0和1的原始变量我该怎么做


Tags: noimportdfhomeatyesgpother

热门问题