我有一个数据框,里面有很多地方的信息。你知道吗
import pandas as pd
d = ({
'C' : ['08:00:00','XX','08:10:00','XX','08:41:42','XX','08:50:00','XX', '09:00:00', 'XX','09:15:00','XX','09:21:00','XX','09:30:00','XX','09:40:00','XX'],
'D' : ['Home','','Home','','Away','','Home','','Away','','Home','','Home','','Away','','Home',''],
'E' : ['Num:','','Num:','','Num:','','Num:','','Num:', '','Num:','','Num:','','Num:', '','Num:', ''],
'F' : ['1','','1','','1','','1','','1', '','2','','2','','1', '','2',''],
'A' : ['A','','A','','A','','A','','A','','A','','A','','A','','A',''],
'B' : ['Stop','','Res','','Stop','','Start','','Res','','Stop','','Res','','Start','','Start','']
})
df = pd.DataFrame(data=d)
我想将这些数据导出到它们各自的位置,这些位置用Column D
标记。我还想添加基于Column B
中标记的函数的新列。你知道吗
df['C'] = pd.to_timedelta(df['C'], errors="coerce").dt.total_seconds()
incl = ['Home', 'Away']
for k, g in df[df.D.isin(incl)].groupby('D'):
Stop = g.loc[df['B'] == 'Stop'].reset_index()['C']
Start = g.loc[df['B'] == 'Start'].reset_index()['C']
Res = g.loc[df['B'] == 'Res'].reset_index()['C']
g['Start_diff'] = Start - Stop
g['Res_diff'] = Start - Res
问题是这些函数多次出现,它们在Column F
中被标记。因此,如果我们查看Home
的导出,我们第一次得到Column F
中的diff。你知道吗
输出:
A B C D E F Start_diff Res_diff
0 A Stop 28800 Home Num: 1 3000 2400
2 A Res 29400 Home Num: 1
6 A Start 31800 Home Num: 1
10 A Stop 33300 Home Num: 2
12 A Res 33660 Home Num: 2
16 A Start 34800 Home Num: 2
我希望预期的结果是:
A B C D E F Start_diff Res_diff
0 A Stop 28800 Home Num: 1 3000 2400
2 A Res 29400 Home Num: 1
6 A Start 31800 Home Num: 1
10 A Stop 33300 Home Num: 2 1500 1200
12 A Res 33660 Home Num: 2
16 A Start 34800 Home Num: 2
我试过把for k, g in df[df.D.isin(incl)].groupby('D'):
改成for k, g in df[df.D.isin(incl)].groupby('D').F.nunique():
但是我得到一个错误TypeError: 'numpy.int64' object is not iterable
我认为需要使用groupby
D
和F
列的自定义函数,用mask
替换重复的值:相关问题 更多 >
编程相关推荐