分组后如何将行转换为列（使用自定义名称）？

employee_id salary other1 other2 other3 1 50000 somedata1 somedata2 somedata3 1 48000 somedata1 somedata2 somedata3 2 80000 somedata20 somedata21 somedata22 2 77000 somedata20 somedata21 somedata22 2 75000 somedata20 somedata21 somedata22 2 74000 somedata20 somedata21 somedata22 3 60000 somedata30 somedata31 somedata32

employee_id salary prevsalary1 prevsalary2 prevsalary3 other1 other2 other3 1 50000 48000 48000 48000 somedata1 somedata2 somedata3 2 80000 77000 75000 74000 somedata20 somedata21 somedata22 3 60000 60000 60000 60000 somedata30 somedata31 somedata32

df = pd.DataFrame({'emp_id':[1,1,2,2,2,2,3],'salary':[50000,48000,80000,77000,75000,74000,60000]}) df['other1'] =['somedata1','somedata1','somedata20','somedata20','somedata20','somedata20','somedata30'] df['other2'] = df['other1'].apply(lambda x: x+'1') df['other3'] = df['other1'].apply(lambda x: x+'2') df Out[59]: emp_id salary other1 other2 other3 0 1 50000 somedata1 somedata11 somedata12 1 1 48000 somedata1 somedata11 somedata12 2 2 80000 somedata20 somedata201 somedata202 3 2 77000 somedata20 somedata201 somedata202 4 2 75000 somedata20 somedata201 somedata202 5 2 74000 somedata20 somedata201 somedata202 6 3 60000 somedata30 somedata301 somedata302

2条回答

网友

1楼 · 编辑于 2024-10-03 21:34:08

先透视薪资表，然后与非薪资数据合并

# first create a copy of the dataset without the salary column
dataset_without_salaries = df.drop('salary', axis=1).drop_duplicates()
# pivot only salary column 
temp = pd.pivot_table(data=df[['salary']], index=df['employee_id'], aggfunc=list)
# expand the list
temp2 = temp.apply(lambda x: pd.Series(x['salary']), axis=1)
# merge the two together
final = pd.merge(temp2, dataset_without_salaries)

网友

2楼 · 编辑于 2024-10-03 21:34:08

一种方法是将^{}与^{}一起使用：

g = df.groupby('employee_id')
cols = g.salary.cumcount()
out = df.pivot_table(index='employee_id', values='salary', columns=cols).ffill(1)
# Crete list of column names matching the expected output
out.columns = ['salary'] + [f'prevsalary{i}' for i in range(1,len(out.columns))]

print(out)
             salary  prevsalary1  prevsalary2  prevsalary3
employee_id                                                
1            50000.0      48000.0      48000.0      48000.0
2            80000.0      77000.0      75000.0      74000.0
3            60000.0      60000.0      60000.0      60000.0

现在我们只需要连接原始数据帧中唯一的other列：

out = out.join(df.filter(like='other').groupby(df.employee_id).first())

print(out)

             salary    prevsalary1  prevsalary2  prevsalary3      other1  \
employee_id                                                               
1            50000.0      48000.0      48000.0      48000.0   somedata1   
2            80000.0      77000.0      75000.0      74000.0  somedata20   
3            60000.0      60000.0      60000.0      60000.0  somedata30   

                 other2      other3  
employee_id                          
1             somedata2   somedata3  
2            somedata21  somedata22  
3            somedata31  somedata32

相关问题更多 >

编程相关推荐

热门问题

热门文章