早上好
初始数据:
df ={'Trading_Book': ['A', 'A', 'B', 'B','C','C','C',],
'Client': ['AA', 'AA', 'BB', 'BB','CC','CC','CC',],
'Ccy':[ 'AUD', 'AUD', 'AUD', 'AUD','AUD','AUD','AUD',],
'RFQ_State': ['Done', 'Done', 'Done', 'Traded Away','Traded Away', 'Traded Away','Done'],
'Competing_Dealers': [2, 4, 6, 8, 10, 12,14],
'Notional': [1000, 1000, 1000, 1000,1000,1000,1000],
}
print(df)
Trading_Book Client Ccy RFQ_State Competing_Dealers Notional
A AA AUD Done 2 1000
A AA AUD Done 4 1000
B BB AUD Done 6 1000
B BB AUD Traded Away 8 1000
C CC AUD Traded Away 10 1000
C CC AUD Traded Away 12 1000
C CC AUD Done 14 1000
我的最终状态是一个Excel报告,总计在顶部,并按列分组
Grand Totals 7 8 7,000 4,000 0.571429
Trading_Book Client Ccy Total_RFQs Avg_Competing_Dealers Notional Notional_Done Hit_Rate
A AA AUD 2 3 2,000 2,000 1
B BB AUD 2 7 2,000 1,000 0.5
C CC AUD 3 12 3,000 1,000 0.3333
我的尝试:
a)每Trading_Book
、Client
、Ccy
分组添加总列,即总RFQ、总名义完成数(如果state
='完成'填充)和命中率(名义完成/名义完成):
# Add the Notional where state = done
mask = df['RFQ_State'].str.contains('Done')
df.loc[mask, 'Notional_Done'] = df['Notional']
# Populate any nulls due to 0/0
df = df.fillna(0)
# Column for Done RFQ's used in Hit Rate
df['Done_RFQ'] = np.where(df['RFQ_State'].str.contains('Done'), 1, 0)
# Hit_Rate = Done Notional / Total Notional
df['Hit_Rate'] = (df['Notional_Done'] / df['Notional'])
df.to_csv('C:\EOM_ML_201901\outputs\Scratchpad2' + '.csv')
b)分组
df_Detail = df.groupby(['Trading_Book', 'Client', 'Ccy']).agg({'RFQ_State': 'count',
'Competing_Dealers': 'mean',
'Notional': 'sum',
'Notional_Done': 'sum',
}).reset_index()
df_Detail = df_Detail.sort_values(['Trading_Book', 'Client'], ascending=[True, True])
df_Detail.rename(columns={
'RFQ_State': 'Total_RFQs',
'Competing_Dealers': 'Avg_Competing_Dealers'
}, inplace=True)
# Hit_Rate = Done Notional / Total Notional
df_Detail['Hit_Rate'] = (df_Detail['Notional_Done'] / df_Detail['Notional'])
print(df_Detail)
df_Detail.to_csv('C:\EOM_ML_201901\outputs\Scratchpad2' + '.csv')
Trading_Book Client Ccy Total_RFQs Avg_Competing_Dealers Notional Notional_Done
A AA AUD 2 3 2000 2000
B BB AUD 2 7 2000 1000
C CC AUD 3 12 3000 0.3333333
c)根据原始数据,找出按列分组的每一项的总计:
df_Summay = df.groupby(['Ccy']).agg({'RFQ_State': 'count',
'Competing_Dealers': 'mean',
'Notional': 'sum',
'Notional_Done': 'sum',
}).reset_index()
df_Summay.rename(columns={
'RFQ_State': 'Total_RFQs',
'Competing_Dealers': 'Avg_Competing_Dealers'
}, inplace=True)
# Hit_Rate = Done Notional / Total Notional
df_Summay['Hit_Rate'] = (df_Summay['Notional_Done'] / df_Summay['Notional'])
df_Summay.to_csv('C:\EOM_ML_201901\outputs\Scratchpad3' + '.csv')
print(df_Summay)
Ccy Total_RFQs Avg_Competing_Dealers Notional Notional_Done Hit_Rate
AUD 7 8 7000 4000 0.571429
问题:
df_Summary
(减去头和AUD值)堆叠在df_Detail
之上
这真的是你所表现出来的
groupby()
输出
首先在
df_Summay
中添加soem not exist列,并按df_Detail
更改顺序:然后使用this函数将两个数据帧追加到同一excel文件:
相关问题 更多 >
编程相关推荐