在pandas中创建二进制分类变量将4个类别合并为2个

2024-10-03 17:28:11 发布

您现在位置:Python中文网/ 问答频道 /正文

我有以下数据框,其中列“Status”由4个分类值组成——“Open”、“Closed”、“Solved”和“Pending”

0   250635                      Comcast Cable Internet Speeds  22-04-15   
1   223441       Payment disappear - service got disconnected  04-08-15   
2   242732                                  Speed and Service  18-04-15   
3   277946  Comcast Imposed a New Usage Cap of 300GB that ...  05-07-15   
4   307175         Comcast not working and no service to boot  26-05-15   

  Date_month_year         Time        Received Via      City     State  \
0       22-Apr-15   3:53:50 PM  Customer Care Call  Abingdon  Maryland   
1       04-Aug-15  10:22:56 AM            Internet   Acworth   Georgia   
2       18-Apr-15   9:55:47 AM            Internet   Acworth   Georgia   
3       05-Jul-15  11:59:35 AM            Internet   Acworth   Georgia   
4       26-May-15   1:25:26 PM            Internet   Acworth   Georgia   

   Zip code  Status Filing on Behalf of Someone  
0     21009  Closed                          No  
1     30102  Closed                          No  
2     30101  Closed                         Yes  
3     30101    Open                         Yes  
4     30101  Solved                          No  

我希望将“打开”和“挂起”类别组合为“打开”列,将“关闭”和“已解决”类别组合为带有0和1二进制文件的“关闭”列。如果我使用pd.get_dummies(df, columns=['Status']),我会得到以下输出,其中4个值有4个新列,但我只需要2个,如前所述。我在这里找不到任何以前的线索,所以请建议任何可能的方法。多谢各位

0          22-Apr-15   3:53:50 PM  Customer Care Call    Abingdon  Maryland   
1          04-Aug-15  10:22:56 AM            Internet     Acworth   Georgia   
2          18-Apr-15   9:55:47 AM            Internet     Acworth   Georgia   
3          05-Jul-15  11:59:35 AM            Internet     Acworth   Georgia   
4          26-May-15   1:25:26 PM            Internet     Acworth   Georgia   
             ...          ...                 ...         ...       ...   
2219       04-Feb-15   9:13:18 AM  Customer Care Call  Youngstown   Florida   
2220       06-Feb-15   1:24:39 PM  Customer Care Call   Ypsilanti  Michigan   
2221       06-Sep-15   5:28:41 PM            Internet   Ypsilanti  Michigan   
2222       23-Jun-15  11:13:30 PM  Customer Care Call   Ypsilanti  Michigan   
2223       24-Jun-15  10:28:33 PM  Customer Care Call   Ypsilanti  Michigan   

      Zip code Filing on Behalf of Someone  Status_Closed  Status_Open  \
0        21009                          No              1            0   
1        30102                          No              1            0   
2        30101                         Yes              1            0   
3        30101                         Yes              0            1   
4        30101                          No              0            0   
       ...                         ...            ...          ...   
2219     32466                          No              1            0   
2220     48197                          No              0            0   
2221     48197                          No              0            0   
2222     48197                          No              0            0   
2223     48198                         Yes              0            1   

      Status_Pending  Status_Solved  
0                  0              0  
1                  0              0  
2                  0              0  
3                  0              0  
4                  0              1  
             ...            ...  
2219               0              0  
2220               0              1  
2221               0              1  
2222               0              1  
2223               0              0  

Tags: nostatuscustomercallaminternetapryes
3条回答

随你的便

df['Status_open'] = 0
df['Status_closed'] = 0
df.loc[(df['Status'] == 'Open') | (df['Status'] == 'Pending'), 'Status_open'] = 1
df.loc[(df['Status'] == 'Closed') | (df['Status'] == 'Solved'), 'Status_closed'] = 1

(未使用pc进行测试)

我认为可以这样做:

open_ls = ['Open', 'Pending']
df['New_Status'] = df['Status'].apply(lambda x: 'Open' if x in open_ls else 'Closed')
pd.get_dummies(df, columns=['New_Status'])

以下是基本原则:

for i, row in df.iterrows():
        if 'Open' in row['Status']:
            df.at[i,'Open'] =  True # or any other value 
        if 'Pending' in row['Status']:
            df.at[i,'Open'] =  True # or any other value
        if  'Closed' in row['Status']:
            df.at[i,'Closed'] =  True # or any other value
        if  'Solved' in row['Status']:
            df.at[i,'Closed'] =  True # or any other value

在列检查中迭代任何值,如果找到该值,则在新列“Open”中设置一个布尔值。当然,在执行此操作之前,您需要创建“Open”列

相关问题 更多 >