如何将2个数据帧与非唯一多索引合并/合并/连接,以协调内容?

2024-09-30 18:15:43 发布

您现在位置:Python中文网/ 问答频道 /正文

我有2个来自2个源的下面的数据帧,3个白色列是索引。这是两份关于历史交易的报告。只有当“交易日期”、“交换工具”和“提示日期”三列相同时,才能对交易进行比较。“交易日期”是因为它们是按时间顺序报告的。只有当“交易工具”和“提示日期”相同时,期货合约才是相同的

df1

df2

我只想合并2个dfs,这样就有7列具有相同的3个索引。两份报告中的同一指数的值并非唯一的,这可能具有挑战性:例如,对于8月1日的CMX Cu合同,提示2020-03-01,有3次交易和不同的价格:

我尝试了concat和merge,但从未得到想要的df。。例如,在尝试时

df_complete= pd.concat([df_ctrm_timelined, df_broker_timelined],axis=1) 

我明白了

ValueError: cannot handle a non-unique multi-index!

如果需要原始数据,这是两个dfs的前10行,两个dfs的行数都相同

df_经纪人_时间限制[:10]

                                           broker_Lots  broker_TradePrice  \
Trade Date Exchange Instrument PromptDate                                   
2019-08-01 CMX Cu              2020-03-01           -1             2.6840   
                               2020-03-01           -1             2.6865   
                               2020-03-01           -2             2.6870   
                               2019-09-01            1             2.6640   
                               2019-09-01            1             2.6665   
                               2019-09-01            2             2.6670   
           LME Al              2019-10-16            6          1777.5000   
                               2019-11-01           -3          1779.0000   
                               2019-11-01           -1          1779.0000   
                               2019-11-01           -2          1779.0000  



                                           broker_Quantity  
Trade Date Exchange Instrument PromptDate                   
2019-08-01 CMX Cu              2020-03-01           -25000  
                               2020-03-01           -25000  
                               2020-03-01           -50000  
                               2019-09-01            25000  
                               2019-09-01            25000  
                               2019-09-01            50000  
           LME Al              2019-10-16              150  
                               2019-11-01              -75  
                               2019-11-01              -25  
                               2019-11-01              -50  

df_ctrm_时间限制[:10]

                                           ctrm_TradePrice  ctrm_Lots  \
Trade Date Exchange Instrument PromptDate                               
2019-08-01 CMX Cu              2019-09-30           2.6640          1   
                               2019-09-30           2.6665          1   
                               2019-09-30           2.6670          2   
                               2020-03-31           2.6840         -1   
                               2020-03-31           2.6865         -1   
                               2020-03-31           2.6870         -2   
           LME Al              2019-10-16        1777.5000          6   
                               2019-11-01        1792.5000          3   
                               2019-11-01        1792.5000          3   
                               2019-11-01        1781.5000         -6   

                                           ctrm_Quantity    Strategy  
Trade Date Exchange Instrument PromptDate                             
2019-08-01 CMX Cu              2019-09-30          25000  Strategy 1  
                               2019-09-30          25000  Strategy 1  
                               2019-09-30          50000  Strategy 1  
                               2020-03-31         -25000  Strategy 1  
                               2020-03-31         -25000  Strategy 1  
                               2020-03-31         -50000  Strategy 1  
           LME Al              2019-10-16            150  Strategy 2  
                               2019-11-01             75  Strategy 2  
                               2019-11-01             75  Strategy 2  
                               2019-11-01           -150  Strategy 2 

Tags: dfdateexchange报告交易brokeraltrade
1条回答
网友
1楼 · 发布于 2024-09-30 18:15:43

处理非唯一索引:

from seaborn import load_dataset

#Create one dataframe with unique indexes, set multiindex
df = load_dataset('tips')
df = df.set_index(['day', 'time', 'sex', 'smoker'])
#Create a unique label per inner most index
df = df.set_index(df.groupby(level=[0,1,2,3]).cumcount(), append=True)

#Create second dataframe 
df2 = df * 2

#Use join 
df.join(df2, lsuffix='_2')

输出:

                              total_bill_2  tip_2  size_2  total_bill    tip  size
day  time   sex    smoker                                                         
Sun  Dinner Female No     0          16.99   1.01       2       33.98   2.02     4
            Male   No     0          10.34   1.66       3       20.68   3.32     6
                          1          21.01   3.50       3       42.02   7.00     6
                          2          23.68   3.31       2       47.36   6.62     4
            Female No     1          24.59   3.61       4       49.18   7.22     8
...                                    ...    ...     ...         ...    ...   ...
Sat  Dinner Male   No     30         29.03   5.92       3       58.06  11.84     6
            Female Yes    14         27.18   2.00       2       54.36   4.00     4
            Male   Yes    26         22.67   2.00       2       45.34   4.00     4
                   No     31         17.82   1.75       2       35.64   3.50     4
Thur Dinner Female No     0          18.78   3.00       2       37.56   6.00     4

[244 rows x 6 columns]

以下是使用“提示”数据集的示例:

from seaborn import load_dataset

#Create one dataframe with unique indexes, set multiindex
df = load_dataset('tips')
df = df.set_index(['day', 'time', 'sex', 'smoker'])
df = df.groupby(level=[0,1,2,3]).first().dropna(how='all')

#Create second dataframe 
df2 = df * 2

#Use join 
df.join(df2, lsuffix='_2')

输出:

                           total_bill_2  tip_2  size_2  total_bill    tip  size
day  time   sex    smoker                                                      
Thur Lunch  Male   Yes            19.44   3.00     2.0       38.88   6.00   4.0
                   No             27.20   4.00     4.0       54.40   8.00   8.0
            Female Yes            19.81   4.19     2.0       39.62   8.38   4.0
                   No             10.07   1.83     1.0       20.14   3.66   2.0
     Dinner Female No             18.78   3.00     2.0       37.56   6.00   4.0
Fri  Lunch  Male   Yes            12.16   2.20     2.0       24.32   4.40   4.0
            Female Yes            13.42   3.48     2.0       26.84   6.96   4.0
                   No             15.98   3.00     3.0       31.96   6.00   6.0
     Dinner Male   Yes            28.97   3.00     2.0       57.94   6.00   4.0
                   No             22.49   3.50     2.0       44.98   7.00   4.0
            Female Yes             5.75   1.00     2.0       11.50   2.00   4.0
                   No             22.75   3.25     2.0       45.50   6.50   4.0
Sat  Dinner Male   Yes            38.01   3.00     4.0       76.02   6.00   8.0
                   No             20.65   3.35     3.0       41.30   6.70   6.0
            Female Yes             3.07   1.00     1.0        6.14   2.00   2.0
                   No             20.29   2.75     2.0       40.58   5.50   4.0
Sun  Dinner Male   Yes             7.25   5.15     2.0       14.50  10.30   4.0
                   No             10.34   1.66     3.0       20.68   3.32   6.0
            Female Yes            17.51   3.00     2.0       35.02   6.00   4.0
                   No             16.99   1.01     2.0       33.98   2.02   4.0

相关问题 更多 >