为便于显示,将总计标题行附加到分组数据

2024-09-29 02:20:43 发布

您现在位置:Python中文网/ 问答频道 /正文

早上好

初始数据:

df ={'Trading_Book': ['A', 'A', 'B', 'B','C','C','C',],
'Client': ['AA', 'AA', 'BB', 'BB','CC','CC','CC',],
'Ccy':[ 'AUD', 'AUD', 'AUD', 'AUD','AUD','AUD','AUD',],
'RFQ_State': ['Done', 'Done', 'Done', 'Traded Away','Traded Away', 'Traded Away','Done'],
'Competing_Dealers': [2, 4, 6, 8, 10, 12,14],
'Notional': [1000, 1000, 1000, 1000,1000,1000,1000],
        }

print(df)

Trading_Book    Client  Ccy RFQ_State   Competing_Dealers   Notional
A               AA      AUD Done        2                   1000
A               AA      AUD Done        4                   1000
B               BB      AUD Done        6                   1000
B               BB      AUD Traded Away 8                   1000
C               CC      AUD Traded Away 10                  1000
C               CC      AUD Traded Away 12                  1000
C               CC      AUD Done        14                  1000

我的最终状态是一个Excel报告,总计在顶部,并按列分组

Grand Totals                7           8                       7,000       4,000           0.571429 
Trading_Book    Client  Ccy Total_RFQs  Avg_Competing_Dealers   Notional    Notional_Done   Hit_Rate
    A               AA  AUD 2           3                       2,000       2,000           1
    B               BB  AUD 2           7                       2,000       1,000           0.5
    C               CC  AUD 3           12                      3,000       1,000           0.3333

我的尝试:

a)每Trading_BookClientCcy分组添加总列,即总RFQ、总名义完成数(如果state='完成'填充)和命中率(名义完成/名义完成):

# Add the Notional where state = done
mask = df['RFQ_State'].str.contains('Done')
df.loc[mask, 'Notional_Done'] = df['Notional']
# Populate any nulls due to 0/0
df = df.fillna(0)
# Column for Done RFQ's used in Hit Rate
df['Done_RFQ'] = np.where(df['RFQ_State'].str.contains('Done'), 1, 0)
# Hit_Rate = Done Notional / Total Notional
df['Hit_Rate'] = (df['Notional_Done'] / df['Notional'])
df.to_csv('C:\EOM_ML_201901\outputs\Scratchpad2' + '.csv')

b)分组

df_Detail = df.groupby(['Trading_Book', 'Client', 'Ccy']).agg({'RFQ_State': 'count',
                                                          'Competing_Dealers': 'mean',
                                                          'Notional': 'sum',
                                                          'Notional_Done': 'sum',
                                                        }).reset_index()
df_Detail = df_Detail.sort_values(['Trading_Book', 'Client'], ascending=[True, True])
df_Detail.rename(columns={
    'RFQ_State': 'Total_RFQs',
    'Competing_Dealers': 'Avg_Competing_Dealers'
}, inplace=True)
# Hit_Rate = Done Notional / Total Notional
df_Detail['Hit_Rate'] = (df_Detail['Notional_Done'] / df_Detail['Notional'])

print(df_Detail)
df_Detail.to_csv('C:\EOM_ML_201901\outputs\Scratchpad2' + '.csv')

Trading_Book    Client  Ccy Total_RFQs  Avg_Competing_Dealers   Notional    Notional_Done
    A               AA      AUD 2           3                       2000        2000
    B               BB      AUD 2           7                       2000        1000
    C               CC      AUD 3           12                      3000        0.3333333

c)根据原始数据,找出按列分组的每一项的总计:

df_Summay = df.groupby(['Ccy']).agg({'RFQ_State': 'count',
                                                          'Competing_Dealers': 'mean',
                                                          'Notional': 'sum',
                                                          'Notional_Done': 'sum',
                                                        }).reset_index()
df_Summay.rename(columns={
    'RFQ_State': 'Total_RFQs',
    'Competing_Dealers': 'Avg_Competing_Dealers'
}, inplace=True)
# Hit_Rate = Done Notional / Total Notional
df_Summay['Hit_Rate'] = (df_Summay['Notional_Done'] / df_Summay['Notional'])
df_Summay.to_csv('C:\EOM_ML_201901\outputs\Scratchpad3' + '.csv')
print(df_Summay)

    Ccy Total_RFQs  Avg_Competing_Dealers   Notional    Notional_Done   Hit_Rate
    AUD 7           8                       7000        4000            0.571429 

问题:

  1. 基础数据中可能有多种货币,因此需要解决如何只返回原始数据中的总计,而不返回分组(AUD)
  2. 将这些总计放在df_详图上方并导出。这类似于两个数据帧的附加,即df_Summary(减去头和AUD值)堆叠在df_Detail之上

Tags: dfratecctotalstatedetaildonehit
2条回答

这真的是你所表现出来的

import pandas as pd

df ={'Trading_Book': ['A', 'A', 'B', 'B','C','C','C',],
'Client': ['AA', 'AA', 'BB', 'BB','CC','CC','CC',],
'Ccy':[ 'AUD', 'AUD', 'AUD', 'AUD','AUD','AUD','AUD',],
'RFQ_State': ['Done', 'Done', 'Done', 'Traded Away','Traded Away', 'Traded Away','Done'],
'Competing_Dealers': [2, 4, 6, 8, 10, 12,14],
'Notional': [1000, 1000, 1000, 1000,1000,1000,1000],
        }


df = pd.DataFrame(df)

# generate sub-totals
dfs = (df.groupby(["Trading_Book","Client","Ccy"])
 .agg(
     Total_RFQs=("RFQ_State","count"),
     Avg_Competing_Dealers=("Competing_Dealers","mean"),
     Notional=("Notional","sum"),
     Notional_Done=("Notional", lambda s: (df.loc[s.index,"RFQ_State"].eq("Done")*s).sum())
 )
 .assign(Hit_Rate=lambda d: d["Notional_Done"]/d["Notional"])
)

# generate grandtotals, use dummy column for groupby...
dft = (df.assign(grandtotal=1).groupby(["grandtotal"], as_index=False)
 .agg(
     Total_RFQs=("RFQ_State","count"),
     Avg_Competing_Dealers=("Competing_Dealers","mean"),
     Notional=("Notional","sum"),
     Notional_Done=("Notional", lambda s: (df.loc[s.index,"RFQ_State"].eq("Done")*s).sum())
 )
 .assign(Hit_Rate=lambda d: d["Notional_Done"]/d["Notional"])
       .drop(columns="grandtotal")
)

print(f"{dft.to_string(index=False)}\n{dfs.to_string()}")

输出

 Total_RFQs  Avg_Competing_Dealers  Notional  Notional_Done  Hit_Rate
          7                      8      7000           4000  0.571429
                         Total_RFQs  Avg_Competing_Dealers  Notional  Notional_Done  Hit_Rate
Trading_Book Client Ccy                                                                      
A            AA     AUD           2                      3      2000           2000  1.000000
B            BB     AUD           2                      7      2000           1000  0.500000
C            CC     AUD           3                     12      3000           1000  0.333333

首先在df_Summay中添加soem not exist列,并按df_Detail更改顺序:

df_Summay['Trading_Book'] = 'Grand Totals'

df_Summay = df_Summay.reindex(df_Detail.columns, fill_value='', axis=1)
print(df_Summay)
   Trading_Book Client  Ccy  Total_RFQs  Avg_Competing_Dealers  Notional  \
0  Grand Totals         AUD           7                      8      7000   

   Notional_Done  Hit_Rate  
0         4000.0  0.571429  

然后使用this函数将两个数据帧追加到同一excel文件:

filename = 'out.xlsx'
append_df_to_excel(filename, df_Summay, sheet_name='Sheet2',  header=None, index=False)
append_df_to_excel(filename, df_Detail, sheet_name='Sheet2', index=False, startrow=1)

相关问题 更多 >