根据表中的某些条件删除特定行

2024-09-29 20:21:46 发布

您现在位置:Python中文网/ 问答频道 /正文

我想清理我的数据,基本上我有我的数据-

数据帧-

d = {'User': ['Mansi kinney', 'Mansi kinney', 'Mansi kinney', 'Alley Huff', 'Alley Huff', 'Alley Huff',  Raedden Grip', 'Raedden Grip',  'S.Sarkar',
                              'S.Sarkar', 'S.Sarkar'],
                      'Work': ['', '', '', 'College', 'College', 'College', '', '', 'Business', 'Business', 'Business'],
                      'Country': ['Aus', 'Aus', 'Australia', 'US','US', 'US', 'Ban', 'Ban',
                                 'Ind', 'Ind', 'Ind'],
                      'Dept': ['Safety', 'Safety', 'Safety', '', '', '', '', '', '', '', ''],
                      'Training': ['', 'Internal', '', '', 'External', '', '', '', '', 'Internal', ''],
                      'Status': ['', '', 'Active', '', '', 'Active', '', 'Active', '', '', '']
        }
    df = pd.DataFrame(data=d)
    df

在这里,我想删除更多单元格为空且数据分散的行,因此我想将其放在一行中,并删除不必要的行重复

我的输出应该是-

d = {'User':['Mansi kinney','Alley Huff','Raedden Grip', 'S.Sarkar'],
'Work': ['', 'College', '', 'Business'],
'Country': ['Aus', 'US', 'Ban',  'Ind'],
'Dept': ['Safety', '', '', ''],
                      'Training':['Internal','External', '', 'Internal'],
'Status':['Active','Active','Active', 'Active']
        }
    df = pd.DataFrame(data=d)
    df

我已经在智能手机上输入了全部内容,请告诉我问题是否清楚。请帮助我清理数据并获得所需的输出。提前感谢


Tags: 数据dfbusinessactiveinternalusindsafety
1条回答
网友
1楼 · 发布于 2024-09-29 20:21:46

您可以按“用户”分组并使用“”聚合。使用unique()加入并删除重复项:

df = df.groupby('User').agg(lambda x: ''.join(x.unique()))
df.reset_index(inplace=True)

print(df)  
#output:
           User      Work       Country    Dept  Training  Status
0    Alley Huff   College            US          External  Active
1  Mansi kinney            AusAustralia  Safety  Internal  Active
2  Raedden Grip                     Ban                    Active
3      S.Sarkar  Business           Ind          Internal  

更新:这是您的完整数据

import pandas as pd
df = pd.read_excel('your_data.xls')
df = df.groupby('user').agg(lambda x: ''.join(x.unique()))
df.reset_index(inplace=True)
pd.set_option('display.max_columns', None)
print(df)

#output:
                   user                                              works  \
0        Abhishek Mitra  Director & CEO | INDIAN CYBER SECURITY SOLUTIO...   
1         Anandita Kaul       HR Associate - Recruitments at Pulp Strategy   
2         Glam Sorvey.B    Data Science & Analytics with Python Consultant   
3   Madhurima S. Sarkar  MBA in Business Analytics and Finance || The S...   
4         Mansi Makhija                                    Works at Amazon   
5       NanaoSana Singh  DevOps | AWS | Docker | Kubernetes | Git | Jen...   
6         Neeraj Mishra                               DGM at Pulp Strategy   
7       Niral Shahpatel  Scale and Strategy Lead - Global Partner Marke...   
8      Sandhya Ramagiri     Technical Program Manager at Intel Corporation   
9         Sarthak Ahuja         Associate Account Manager at Pulp Strategy   
10         UDIT NARAYAN          Student at Narula Institute Of Technology   

                             country  
0        Kolkata, West Bengal, India  
1          South Delhi, Delhi, India  
2    Portland, Oregon, United States  
3        Kolkata, West Bengal, India  
4        Noida, Uttar Pradesh, India  
5           Pune, Maharashtra, India  
6                              India  
7   Hillsboro, Oregon, United States  
8    Austin, Texas Metropolitan Area  
9                       Delhi, India  
10       Domchanch, Jharkhand, India 

相关问题 更多 >

    热门问题