如何加速一个非常慢的Pandas应用功能？

def proc_trader(data): data['_seq'] = np.nan # make every ending of a roundtrip with its index data.ix[data.cumq == 0,'tag'] = np.arange(1, (data.cumq == 0).sum() + 1) # backfill the roundtrip index until previous roundtrip; # then fill the rest with 0s (roundtrip incomplete for most recent trades) data['_seq'] =data['tag'].fillna(method = 'bfill').fillna(0) return data['_seq'] # btw, why on earth this function returns a dataframe instead of the series `data['_seq']`??

import pandas as pd import numpy as np reshaped= pd.DataFrame({'trader' : ['a','a','a','a','a','a','a'],'stock' : ['a','a','a','a','a','a','b'], 'day' :[0,1,2,4,5,10,1],'delta':[10,-10,15,-10,-5,5,0] ,'out': [1,1,2,2,2,0,1]}) reshaped.sort_values(by=['trader', 'stock','day'], inplace=True) reshaped['cumq']=reshaped.groupby(['trader', 'stock']).delta.transform('cumsum') reshaped['_spell']=reshaped.groupby(['trader','stock'])[['cumq']].apply(proc_trader).reset_index()['_seq']

1条回答

网友

1楼 · 发布于 2024-09-27 20:17:26

这里没什么特别的，只是在一些地方做了些调整。实际上不需要输入函数，所以我没有。在这个小样本数据中，它的速度大约是原始数据的两倍。在

reshaped.sort_values(by=['trader', 'stock','day'], inplace=True)
reshaped['cumq']=reshaped.groupby(['trader', 'stock']).delta.cumsum()
reshaped.loc[ reshaped.cumq == 0, '_spell' ] = 1
reshaped['_spell'] = reshaped.groupby(['trader','stock'])['_spell'].cumsum()
reshaped['_spell'] = reshaped.groupby(['trader','stock'])['_spell'].bfill().fillna(0)

结果：

^{pr2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章