我已经设法用for循环实现了这一点,但是在我正在使用的大型数据集上,它们的速度非常慢,所以我尝试用pandas、groupby、apply和lamda函数来实现这一点。你知道吗
import pandas as pd
example_df = pd.DataFrame({"scen": [1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2],
"cusip": ['031162CF5', '031162CF5', '031162CF5', '031162CF5', '38141GWM2', '38141GWM2', '38141GWM2', '38141GWM2', '031162CF5', '031162CF5', '031162CF5', '031162CF5', '38141GWM2', '38141GWM2', '38141GWM2', '38141GWM2'],
"wal": [50, 55, 60, 65, 40, 50, 60, 70, 40, 45, 50, 55, 30, 40, 50, 60],
"par_val": [900000, 800000, 700000, 600000, 900000, 800000, 700000, 600000, 900000, 800000, 700000, 600000, 900000, 800000, 700000, 600000],
"prin_cf": [0, 100000, 100000, 100000, 0, 100000, 100000, 100000, 0, 100000, 100000, 100000, 0, 100000, 100000, 100000],
"amortization": [166.67, 0, 0, 0, 208.33, 0, 0, 0, 208.33, 0, 0, 0, 277.78, 0, 0, 0],
"book_val": [1000000, 0, 0, 0, 1000000, 0, 0, 0, 1000000, 0, 0, 0, 1000000, 0, 0, 0]})
for x in range(1, len(example_df['scen'])):
if (example_df['cusip'][x] == example_df['cusip'][x-1]):
# If bond matures, don't report book value
if(example_df['par_val'][x] == 0):
example_df['book_val'][x] = 0
else:
example_df['book_val'][x] = example_df['book_val'][x-1] - example_df['amortization'][x-1] - example_df['prin_cf'][x-1]
example_df['amortization'][x] = (example_df['book_val'][x] - example_df['par_val'][x]) / example_df['wal'][x] / 12
example_df
棘手的是,每一行的账面价值取决于前一行的摊销价值,而每一摊销价值取决于同一行的账面价值。看看这里对一个类似问题的回答,我认为有一种方法可以做到这一点,即使用全局变量来跟踪以前的值。你知道吗
比如:
def calc_bv(prin_cf, par_val, wal):
global bvalue, amort
bvalue = bvalue - amort - prin_cf
amort = (bvalue - par_val)/wal/12
return bvalue, amort
bvalue = example_df.loc[0, 'book_val']
amort = example_df.loc[0, 'amortization']
example_df[1:][['book_val','amortization']] = example_df2[1:].apply(lambda row: calc_bv(row['prev_prin_cf'],row['par_val'],row['wal']), axis=1, result_type="expand")
example_df
毫无疑问,会有一个基于
groupby
的智能熊猫解决方案。但是,只要用numba
重写循环,就可以获得大约1000倍的性能改进。你知道吗基准代码
原件:
麻木:
相关问题 更多 >
编程相关推荐