将多个Pandas数据帧与重复的datetime索引对组合起来

2024-10-03 11:21:43 发布

您现在位置:Python中文网/ 问答频道 /正文

我有三个Pandas数据帧,按日期时间索引:df1、df2和df3。每个索引中都有一对日期。我想将这三个数据帧组合在一起,保留唯一的任何日期时间索引对,但是组合任何重复的对,这样这些日期对不会被多次列出(不是简单的concat)。以下是数据帧的示例:

In [1]: print df1
            CurTempMid      id
fldDate                       
1997-12-23         0.0  recent
1997-12-23        -2.0    hist
1997-12-27         9.0  recent
1997-12-27         7.0    hist     
1998-02-10         9.0  recent
1998-02-10         7.0    hist
...                ...     ... 
2001-01-04        27.0  recent
2001-01-04        26.0    hist
2001-03-16        12.0  recent
2001-03-16        11.0    hist
2001-04-06        23.0  recent
2001-04-06        22.0    hist

In [2]: print df2
            MaxTempMid      id
fldDate                       
1998-01-02        29.0  recent
1998-01-02        28.0    hist
1998-02-15        18.0  recent
1998-02-15        23.0    hist
1998-02-23        24.0  recent
1998-02-23        15.0    hist
...                ...     ... 
2001-01-01        16.0  recent
2001-01-01        22.0    hist
2001-01-04        30.0  recent
2001-01-04        37.0    hist
2001-02-16        14.0  recent
2001-02-16        11.0    hist

In [3]: print df3
            MinTempMid      id
fldDate                       
1997-12-23         0.0  recent
1997-12-23        -2.0    hist
1997-12-26        -3.0  recent
1997-12-26        -2.0    hist
1997-12-27        -1.0  recent
1997-12-27         0.0    hist
...                ...     ...
2001-02-18         9.0  recent
2001-02-18        36.0    hist
2001-03-11        18.0  recent
2001-03-11        38.0    hist
2001-03-12        13.0  recent
2001-03-12        16.0    hist

预期结果如下:

^{pr2}$

合并后,'id'列应该是相同的,所以我只需要保留一个'id'列。在


Tags: 数据inid示例pandas时间histdf1
1条回答
网友
1楼 · 发布于 2024-10-03 11:21:43

如果您确定id列在时间序列中是相同的,那么这个解决方案应该适合您。您可以合并fldDate和id列上的三个dataframe,然后将索引设置回fldDate。在

m = (df1.reset_index()
        .merge(df2.reset_index(), on=['fldDate', 'id'], how='outer')
        .merge(df3.reset_index(), on=['fldDate', 'id'], how='outer')
        .sort_values('fldDate'))
m.set_index('fldDate', inplace=True)
print(m.head())
#             CurTempMid      id  MaxTempMid  MinTempMid
# fldDate
# 1997-12-23         0.0  recent         NaN         0.0
# 1997-12-23        -2.0    hist         NaN        -2.0
# 1997-12-26         NaN    hist         NaN        -2.0
# 1997-12-26         NaN  recent         NaN        -3.0
# 1997-12-27         9.0  recent         NaN        -1.0

相关问题 更多 >