Pandas通过多个索引和列连接两个数据帧

2024-10-05 11:05:05 发布

您现在位置:Python中文网/ 问答频道 /正文

我有两个数据框对象,它们需要由多个索引和列连接起来

DF1每日数据(指数为RNK、R_ID、纬度和经度):

                                Date        FFDI
RNK R_ID latitude   longitude               
1   0   -39.20000   140.80000   1973-04-02  5.40000
    1   -39.20000   140.83786   1973-04-02  5.40000
    2   -39.20000   140.87572   1973-04-02  5.40000
    3   -39.20000   140.91359   1973-04-02  5.40000
    4   -39.20000   140.95145   1973-04-02  5.40000
    5   -39.20000   140.98930   1973-04-02  5.40000
    6   -39.20000   141.02716   1973-04-02  5.40000
    7   -39.20000   141.06502   1973-05-31  5.40000
    8   -39.20000   141.10289   1973-05-31  5.50000
    9   -39.20000   141.14075   1973-05-31  6.00000
    10  -39.20000   141.17860   1973-05-31  6.40000
    11  -39.20000   141.21646   1973-05-31  6.80000
    12  -39.20000   141.25432   1973-05-31  7.70000
    13  -39.20000   141.29219   1973-05-31  7.90000
    14  -39.20000   141.33005   1973-05-31  7.00000
    15  -39.20000   141.36790   1973-05-31  6.60000
    16  -39.20000   141.40576   1973-05-31  6.10000
    17  -39.20000   141.44362   1973-05-31  5.00000
    18  -39.20000   141.48149   1973-05-31  4.40000
    19  -39.20000   141.51935   1972-04-21  4.40000
    20  -39.20000   141.55721   1972-04-21  4.40000
    21  -39.20000   141.59506   1972-04-21  4.50000
    22  -39.20000   141.63292   1972-04-21  4.60000
    23  -39.20000   141.67079   1972-04-21  4.70000
    24  -39.20000   141.70865   1972-04-21  4.70000
    25  -39.20000   141.74651   1972-04-21  4.70000
    26  -39.20000   141.78436   1972-04-21  4.70000
    27  -39.20000   141.82222   1972-04-21  4.70000
    28  -39.20000   141.86009   1972-04-21  4.70000
    29  -39.20000   141.89795   1972-04-21  4.70000
... ... ... ... ... ... ... ... ... ... ... ...
... ... ... ... ... ... ... ... ... ... ... ...
... ... ... ... ... ... ... ... ... ... ... ...
5   36082   -33.90000   148.90205   1972-12-24  35.70000
    36083   -33.90000   148.93991   1974-11-12  36.30000
    36084   -33.90000   148.97778   1974-11-12  35.90000
    36085   -33.90000   149.01564   1973-11-20  36.80000
    36086   -33.90000   149.05350   1973-11-20  37.00000
    36087   -33.90000   149.09135   1974-11-12  35.60000
    36088   -33.90000   149.12921   1973-01-03  35.90000
    36089   -33.90000   149.16708   1973-01-03  34.40000
    36090   -33.90000   149.20494   1973-01-03  32.90000
    36091   -33.90000   149.24280   1973-01-03  32.20000
    36092   -33.90000   149.28065   1973-01-03  32.30000
    36093   -33.90000   149.31851   1973-01-03  32.20000
    36094   -33.90000   149.35638   1973-01-03  30.20000
    36095   -33.90000   149.39424   1973-11-20  28.60000
    36096   -33.90000   149.43210   1973-11-20  28.70000
    36097   -33.90000   149.46996   1973-11-20  29.10000
    36098   -33.90000   149.50781   1973-11-20  30.10000
    36099   -33.90000   149.54568   1973-11-20  30.80000
    36100   -33.90000   149.58354   1973-01-09  30.60000
    36101   -33.90000   149.62140   1973-01-09  30.10000
    36102   -33.90000   149.65926   1973-01-09  29.50000
    36103   -33.90000   149.69711   1973-01-09  29.20000
    36104   -33.90000   149.73499   1973-01-09  29.90000
    36105   -33.90000   149.77284   1973-01-09  29.90000
    36106   -33.90000   149.81070   1973-01-09  27.60000
    36107   -33.90000   149.84856   1973-01-09  24.40000
    36108   -33.90000   149.88641   1973-01-09  23.80000
    36109   -33.90000   149.92429   1973-01-09  23.80000
    36110   -33.90000   149.96214   1973-01-09  24.10000
    36111   -33.90000   150.00000   1973-01-09  25.30000

DF2每小时数据(索引=R\u ID):

     latitude   longitude    time                T_SFC
R_ID                
0   -39.20000   140.80000   1972-01-20 00:00:00 15.80000
0   -39.20000   140.80000   1972-01-20 01:00:00 15.90000
0   -39.20000   140.80000   1972-01-20 02:00:00 16.00000
0   -39.20000   140.80000   1972-01-20 03:00:00 16.20000
0   -39.20000   140.80000   1972-01-20 04:00:00 16.60000
0   -39.20000   140.80000   1972-01-20 05:00:00 16.60000
0   -39.20000   140.80000   1972-01-20 06:00:00 16.50000
0   -39.20000   140.80000   1972-01-20 07:00:00 16.50000
0   -39.20000   140.80000   1972-01-20 08:00:00 16.50000
0   -39.20000   140.80000   1972-01-20 09:00:00 16.40000
0   -39.20000   140.80000   1972-01-20 10:00:00 16.40000
0   -39.20000   140.80000   1972-01-20 11:00:00 16.40000
0   -39.20000   140.80000   1972-01-20 12:00:00 16.50000
0   -39.20000   140.80000   1972-01-20 13:00:00 16.60000
0   -39.20000   140.80000   1972-01-20 14:00:00 16.60000
0   -39.20000   140.80000   1972-01-20 15:00:00 16.70000
0   -39.20000   140.80000   1972-01-20 16:00:00 16.70000
0   -39.20000   140.80000   1972-01-20 17:00:00 16.60000
0   -39.20000   140.80000   1972-01-20 18:00:00 16.60000
0   -39.20000   140.80000   1972-01-20 19:00:00 16.60000
0   -39.20000   140.80000   1972-01-20 20:00:00 16.50000
0   -39.20000   140.80000   1972-01-20 21:00:00 16.50000
0   -39.20000   140.80000   1972-01-20 22:00:00 16.50000
0   -39.20000   140.80000   1972-01-20 23:00:00 16.40000
0   -39.20000   140.80000   1972-01-21 00:00:00 16.40000
0   -39.20000   140.80000   1972-01-21 01:00:00 16.30000
0   -39.20000   140.80000   1972-01-21 02:00:00 16.30000
0   -39.20000   140.80000   1972-01-21 03:00:00 16.30000
0   -39.20000   140.80000   1972-01-21 04:00:00 16.10000
0   -39.20000   140.80000   1972-01-21 05:00:00 16.00000
... ... ... ... ...
36111   -38.87551   141.14075   1974-12-30 18:00:00 14.10000
36111   -38.87551   141.14075   1974-12-30 19:00:00 14.10000
36111   -38.87551   141.14075   1974-12-30 20:00:00 14.10000
36111   -38.87551   141.14075   1974-12-30 21:00:00 14.10000
36111   -38.87551   141.14075   1974-12-30 22:00:00 14.20000
36111   -38.87551   141.14075   1974-12-30 23:00:00 14.30000
36111   -38.87551   141.14075   1974-12-31 00:00:00 14.40000
36111   -38.87551   141.14075   1974-12-31 01:00:00 14.50000
36111   -38.87551   141.14075   1974-12-31 02:00:00 14.50000
36111   -38.87551   141.14075   1974-12-31 03:00:00 14.50000
36111   -38.87551   141.14075   1974-12-31 04:00:00 14.50000
36111   -38.87551   141.14075   1974-12-31 05:00:00 14.50000
36111   -38.87551   141.14075   1974-12-31 06:00:00 14.60000
36111   -38.87551   141.14075   1974-12-31 07:00:00 14.50000
36111   -38.87551   141.14075   1974-12-31 08:00:00 14.30000
36111   -38.87551   141.14075   1974-12-31 09:00:00 14.40000
36111   -38.87551   141.14075   1974-12-31 10:00:00 14.30000
36111   -38.87551   141.14075   1974-12-31 11:00:00 14.30000
36111   -38.87551   141.14075   1974-12-31 12:00:00 14.40000
36111   -38.87551   141.14075   1974-12-31 13:00:00 14.50000
36111   -38.87551   141.14075   1974-12-31 14:00:00 14.40000
36111   -38.87551   141.14075   1974-12-31 15:00:00 14.30000
36111   -38.87551   141.14075   1974-12-31 16:00:00 14.30000
36111   -38.87551   141.14075   1974-12-31 17:00:00 14.30000
36111   -38.87551   141.14075   1974-12-31 18:00:00 14.30000
36111   -38.87551   141.14075   1974-12-31 19:00:00 14.40000
36111   -38.87551   141.14075   1974-12-31 20:00:00 14.50000
36111   -38.87551   141.14075   1974-12-31 21:00:00 14.60000
36111   -38.87551   141.14075   1974-12-31 22:00:00 14.70000
36111   -38.87551   141.14075   1974-12-31 23:00:00 14.80000

DF1有一个日期列,其中包含1972-01-20到1974-12-31的每日值,而DF2有一个时间列,其中包含1972-01-20T00:00:00到1974-12-31T23:00:00的小时值DF1按RNK(秩)和FFDI排序,而DF2按R\u ID和时间排序。一个R_ID是一个参考ID,对应于一对唯一的纬度和经度DF2将以DF2的时间列所属的相同R_ID和相同日期连接到DF1。也就是说DF1中的每一行(天)将有来自DF2的24(小时)行具有相同的天值

输出df如下所示:

                                                              time                   T_SFC
RNK  R_ID       latitude    longitude   Date        FFDI
1    0          -39.20000   140.80000   1973-04-02  5.40000   1973-04-02 00:00:00    13.8
                                                              1973-04-02 01:00:00    13.9
                                                              1973-04-02 02:00:00    13.0
                                                              1973-04-02 03:00:00    13.2
                                                              1973-04-02 04:00:00    13.6
                                                              ... ... ... ...
     1          -39.20000   140.83786   1973-04-02  5.40000   1973-04-02 00:00:00    13.8
                                                              1973-04-02 01:00:00    13.9
                                                              1973-04-02 02:00:00    13.0
                                                              1973-04-02 03:00:00    13.2
                                                              1973-04-02 04:00:00    13.6
                                                              ... ... ... ...
     2          -39.20000   140.87572   1973-04-02  5.40000   1973-04-02 00:00:00    13.8
                                                              1973-04-02 01:00:00    13.9
                                                              1973-04-02 02:00:00    13.0
                                                              1973-04-02 03:00:00    13.2
                                                              1973-04-02 04:00:00    13.6
                                                              ... ... ... ...
     ... ... ... ...
2    0          -39.20000   140.80000   1974-03-07  5.60000   1974-03-07 00:00:00    15.8
                                                              1974-03-07 01:00:00    15.9
                                                              1974-03-07 02:00:00    16.0
                                                              1974-03-07 03:00:00    16.2
                                                              1974-03-07 04:00:00    16.6
                                                              ... ... ... ...
     1          -39.20000   140.83786   1973-03-09  5.40000   1973-03-09 00:00:00    15.8
                                                              1973-03-09 01:00:00    15.9
                                                              1973-03-09 02:00:00    16.0
                                                              1973-03-09 03:00:00    15.2
                                                              1973-03-09 04:00:00    15.6
                                                              ... ... ... ...
... ... ... ...
... ... ... ...
5    36082     -33.90000    148.90205   1972-12-24  35.70000  1972-12-24 00:00:00    19.8
                                                              1972-12-24 01:00:00    19.1
                                                              1972-12-24 02:00:00    22.0
                                                              1972-12-24 03:00:00    24.2
                                                              1972-12-24 04:00:00    21.6
                                                              ... ... ... ...
     ... ... ... ...
     36111     -33.90000    150.00000   1973-01-09  25.30000  1973-01-09 00:00:00    19.8
                                                              1973-01-09 01:00:00    19.1
                                                              1973-01-09 02:00:00    22.0
                                                              1973-01-09 03:00:00    24.2
                                                              1973-01-09 04:00:00    21.6
                                                              ... ... ... ...
                                                              1973-01-09 23:00:00    19.1
4,333,440 rows x 2 columns

按照@politinsa的回答,我试着

# Add a new column Date and save date part of the time column to it.
df2['Date'] = df2['time'].dt.date.astype('datetime64[ns]')

df_joined = pd.merge(df1, df2, on=['REF_ID', 'Date'], how='inner')

输出的问题是df1中的多索引没有保留,而输出df中缺少RNK

print(df_joined)

        time    FFDI         latitude   longitude   T_SFC       time_original
REF_ID                      
0   1973-04-02  5.40000     -39.20000   140.80000   16.40000    1973-04-02 00:00:00
0   1973-04-02  5.40000     -39.20000   140.80000   16.00000    1973-04-02 01:00:00
0   1973-04-02  5.40000     -39.20000   140.80000   15.70000    1973-04-02 02:00:00
0   1973-04-02  5.40000     -39.20000   140.80000   15.40000    1973-04-02 03:00:00
0   1973-04-02  5.40000     -39.20000   140.80000   15.20000    1973-04-02 04:00:00
0   1973-04-02  5.40000     -39.20000   140.80000   15.10000    1973-04-02 05:00:00
0   1973-04-02  5.40000     -39.20000   140.80000   15.10000    1973-04-02 06:00:00
0   1973-04-02  5.40000     -39.20000   140.80000   15.10000    1973-04-02 07:00:00
0   1973-04-02  5.40000     -39.20000   140.80000   15.10000    1973-04-02 08:00:00
0   1973-04-02  5.40000     -39.20000   140.80000   15.10000    1973-04-02 09:00:00
0   1973-04-02  5.40000     -39.20000   140.80000   15.10000    1973-04-02 10:00:00
0   1973-04-02  5.40000     -39.20000   140.80000   15.20000    1973-04-02 11:00:00
0   1973-04-02  5.40000     -39.20000   140.80000   15.20000    1973-04-02 12:00:00
0   1973-04-02  5.40000     -39.20000   140.80000   15.20000    1973-04-02 13:00:00
0   1973-04-02  5.40000     -39.20000   140.80000   15.00000    1973-04-02 14:00:00
0   1973-04-02  5.40000     -39.20000   140.80000   15.10000    1973-04-02 15:00:00
0   1973-04-02  5.40000     -39.20000   140.80000   15.30000    1973-04-02 16:00:00
0   1973-04-02  5.40000     -39.20000   140.80000   15.40000    1973-04-02 17:00:00
0   1973-04-02  5.40000     -39.20000   140.80000   15.40000    1973-04-02 18:00:00

... ... ... ...
12000 rows × 6 columns

Tags: 数据iddfdatetime时间df1df2
1条回答
网友
1楼 · 发布于 2024-10-05 11:05:05

您可以在DF2中创建一个包含日期的列(而不是日期时间),即在第1973-04-02 01:00:00行创建一个包含1973-04-02的列Date

然后使用一个经典的内部连接(pd.merge(df1, df2, on=['R_ID', 'Date'], how='inner')),它应该可以做到这一点

相关问题 更多 >

    热门问题