用apply方法提高Pandas性能

2024-06-26 17:42:03 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在使用pandas进行高性能计算,下面的函数给出了1个循环,对于50000行,每个循环最好5:7.24秒。你知道吗

我得把它扩展到一百万行。你知道吗

如何向量化函数并应用于所有行。这样才能提高整体性能?你知道吗

def weightedFlowAmt(startDate,endDate,tradeDate,tradeAmt):
  startInDays = datetime.strptime(startDate, "%Y-%m-%d")
  endInDays = datetime.strptime(endDate, "%Y-%m-%d")
  tradeInDays = datetime.strptime(tradeDate, "%Y-%m-%d")
  differenceTradeAndEnd=abs((endInDays - tradeInDays).days)
  differenceStartAndEnd=abs((endInDays - startInDays).days)
  weighted_FlowAmt = (tradeAmt * differenceTradeAndEnd)/differenceStartAndEnd

mutatedCashFlow['flow'] = mutatedCashFlow.apply(lambda row:
        weightedFlowAmt(row['startDate'], row['EndDate'], row['tradeDate'],
                        row['tradeAmount']),
    axis=1)

Tags: 函数datetimeabsdaysrowstrptimestartdateenddate
1条回答
网友
1楼 · 发布于 2024-06-26 17:42:03

我认为您可以删除apply并使用矢量化函数:

mutatedCashFlow['startDate'] = pd.to_datetime(mutatedCashFlow['startDate'])
mutatedCashFlow['EndDate'] = pd.to_datetime(mutatedCashFlow['EndDate'])
mutatedCashFlow['tradeDate'] = pd.to_datetime(mutatedCashFlow['tradeDate'])

diffTradeAndEnd=((mutatedCashFlow['EndDate']-mutatedCashFlow['tradeDate']).dt.days).abs()
diffStartAndEnd=((mutatedCashFlow['EndDate']-mutatedCashFlow['startDate']).dt.days).abs()

mutatedCashFlow['flow'] = (mutatedCashFlow['tradeAmount']*diffTradeAndEnd)/diffStartAndEnd

备选方案:

mutatedCashFlow['startDate'] = pd.to_datetime(mutatedCashFlow['startDate'])
mutatedCashFlow['EndDate'] = pd.to_datetime(mutatedCashFlow['EndDate'])
mutatedCashFlow['tradeDate'] = pd.to_datetime(mutatedCashFlow['tradeDate'])

diffTradeAndEnd=mutatedCashFlow['EndDate'].sub(mutatedCashFlow['tradeDate']).dt.days.abs()
diffStartAndEnd=mutatedCashFlow['EndDate'].sub(mutatedCashFlow['startDate']).dt.days.abs()

mutatedCashFlow['flow'] = mutatedCashFlow['tradeAmount'].mul(diffTradeAndEnd)
                                                        .div(diffStartAndEnd)
print (mutatedCashFlow)

相关问题 更多 >