每小时/每周向后填充数据帧缺失值

2024-10-05 14:21:22 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个像这样的数据帧

Code                   DIAG
Time
1999-12-01 00:00:01.870     None
1999-12-01 00:00:10.870     None
2000-01-01 09:10:09.870    None
2000-01-01 09:10:10.870    None
2000-01-01 09:00:10.940    None
2000-01-01 09:00:11.160    None
2000-01-01 09:00:11.640    None
2000-01-01 09:00:12.460    None
2010-01-01 09:00:34.910    1_19_1_4_0_0
2010-01-01 09:00:35.060    3_22_4_0_0_0
2010-01-01 09:00:35.120    6_22_10_3_0_0

我只想在每个数据前一小时回填丢失的数据,并更改标签,使数据看起来像这样

Code                             DIAG
    Time
    1999-12-01 00:00:01.870     None
    1999-12-01 00:00:10.870     None
    2000-01-01 09:10:09.870    1_19_1_4_0_0_H
    2000-01-01 09:10:10.870    1_19_1_4_0_0_H
    2000-01-01 09:00:10.940    1_19_1_4_0_0_H
    2000-01-01 09:00:11.160    1_19_1_4_0_0_H
    2000-01-01 09:00:11.640    1_19_1_4_0_0_H
    2000-01-01 09:00:12.460    1_19_1_4_0_0_H
    2010-01-01 09:00:34.910    1_19_1_4_0_0_H
    2010-01-01 09:00:35.060    3_22_4_0_0_0
    2010-01-01 09:00:35.120    6_22_10_3_0_0

我写了这个代码,它看起来是这样的:

def FillData(dff):
        s=dff.bfill()
        s.loc[s.notnull()]=s.astype('str').astype('str')+'_H'
        return s

    df=A['DIAG'].groupby(pd.Grouper(freq='H')).apply(FillData)

问题是这会产生如下输出:

Code                             DIAG
    Time
    1999-12-01 00:00:01.870     None
    1999-12-01 00:00:10.870     None
    2000-01-01 09:10:09.870    None
    2000-01-01 09:10:10.870    None
    2000-01-01 09:00:10.940    None
    2000-01-01 09:00:11.160    None
    2000-01-01 09:00:11.640    None
    2000-01-01 09:00:34.460    1_19_1_4_0_0_H
    2010-01-01 09:00:34.910    1_19_1_4_0_0_H
    2010-01-01 09:00:35.060    3_22_4_0_0_0_H
    2010-01-01 09:00:35.120    6_22_10_3_0_0_H

我看到了两个主要问题,第一个是groupby不是按H分组,而是只按分钟分组。另一个问题是它正在向所有行添加label(\uh)。我的主要目标是在数据发生前1小时用H标记,在数据发生前1周用W标记

我很感激如果有人能帮我,我花了很多时间,但我找不到直截了当的方法

谢谢


Tags: 数据代码标记nonetimedefcode标签