删除日期时间差的值,带修订

2024-09-28 01:30:58 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个按主题分组的熊猫数据框,每个主题在以下组织中有多个遭遇日期:

row    Subject    encounter date    difference
0      1          1/1/2015          0
1      1          1/10/2015         9
2      1          1/09/2016         364
3      2          2/8/2015          0
4      2          4/20/2015         71
5      2          3/19/2016         333
6      2          3/22/2016         3
7      2          3/20/2017         363

输出:

row    Subject    encounter date    difference
0      1          1/1/2015          0
2      1          1/09/2016         374
3      2          2/8/2015          0
5      2          3/19/2016         404
7      2          3/20/2017         366

我想遍历按主题分组的所有行,删除与前一行的时间差为<;365的行,并在删除行后对行之间的差异进行主动修订。我当前的代码将删除数据集的第2行,但我希望进行修改,以便在删除行后重新计算时差——在本例中,删除第1行时,将根据时间0计算下一次相遇,并且将为>;365。你知道吗

这是我现在的密码。如有任何帮助,我们将不胜感激:

df = df.drop(df[(((df.groupby('Subject')['Encounter_Date'].diff().fillna(0)) / np.timedelta64(1, 'D')).astype(int) > 0) & (((df.groupby('Subject')['Encounter_Date'].diff().fillna(0)) / np.timedelta64(1, 'D')).astype(int) < 365)].index)


 def drop_rows(date, subject):
    current_subject = subject[0]
    date_diff = date - date   
    j = 1
    for i in range(1,len(date)):
        date_diff[i] = {'subj': current_subject, 'diff': date[i] - date[i-j]}
                                                         # changed to dict
        if subject[i] == current_subject:
            if date_diff[i][2] < pd.Timedelta('365 Days'):    # changed here
                date_diff.drop(i,inplace=True)
                j += 1
            else:
                j = 1
        else:
            date_diff[i][2] = pd.Timedelta('0 Days')          # changed here
            current_subject = subject[i]            
    return pd.DataFrame(data = date_diff, col = ['subj', 'diff'] 

Tags: 数据df主题datediffcurrentdroprow
1条回答
网友
1楼 · 发布于 2024-09-28 01:30:58

这是一个有点黑客,但似乎工作。我添加了您的代码来处理按主题分组的问题,然后在3个位置进行了更改(如下所示)。你知道吗

def drop_rows(date, subject):
    current_subject = subject[0] # changed here
    date_diff = date - date      # timedelta=0, same shape as date
    j = 1
    for i in range(1,len(date)):
        date_diff[i] = date[i] - date[i-j]
        if subject[i] == current_subject:
            if date_diff[i] < pd.Timedelta('365 Days'):
                date_diff.drop(i,inplace=True)
                j += 1
            else:
                j = 1
        else:
            date_diff[i] = pd.Timedelta('0 Days')    # changed here
            current_subject = subject[i]             # changed here
    return date_diff

当然,请注意,您需要按主题和日期排序,并且日期被假定为数据类型datetime。你知道吗

>>> drop_rows(df.date,df.Subject)

0     0 days
2   373 days
3     0 days
5   405 days
7   366 days
Name: date, dtype: timedelta64[ns]

要获取仅包含选定行的新数据帧,可以执行以下操作:

df['new'] = drop_rows(df.date,df.Subject)
df = df[ df['new'].notnull() ]

相关问题 更多 >

    热门问题