如何用最后更新的值减去数据帧的连续行

2024-10-06 11:52:00 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个时间序列数据,需要根据Stage列上次更新的日期添加一行。我有重复的数据:例如

Id    Date            Stage   

1   20-12-2013    Basic

1   20-10-2015    Basic

1   05-12-2018    Advanced

2   20-05-2019    Basic

2   15-12-2019    Advanced

3   20-01-2020    Advanced

4   20-10-2020    Basic

4   20-12-2020    Advanced

预期结果:

Id  Date          Stage     Stage Changed Since

1   20-12-2013    Basic       NaN

1   20-10-2015    Basic       NaN 

1   05-12-2018    Advanced  05-12-2018 - 20-10-2015

2   20-05-2019    Basic       NaN

2   15-12-2019    Advanced  15-12-2019  - 20-05-2019

3   20-01-2020    Advanced    NaN

4   20-10-2020    Basic       NaN

4   20-12-2020    Advanced  20-12-2020 - 20-10-2020 

所以,基本上,当阶段在同一Id内更改时,我需要获得阶段更改后的天数。而舞台也发生了变化,因为专栏应该显示这一点


Tags: 数据iddatebasic时间序列nanstage
1条回答
网友
1楼 · 发布于 2024-10-06 11:52:00

签出https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.shift.html

您可以为stagedate创建两个下移列,并进行比较

d = {
    1: {'date': datetime(2010, 10, 10), 'stage': 'basic'},
    2: {'date': datetime(2010, 11, 10), 'stage': 'basic'},
    3: {'date': datetime(2010, 12, 10), 'stage': 'advanced'},
}

df = pd.DataFrame(d).T

# created shifted columns
df['stage_lagged'] = df['stage'].shift(1)
df['date_lagged'] = df['date'].shift(1)

# compare values
df.loc[df['stage'] != df['stage_lagged'], 'stage_changed_since'] = df['date_lagged']

# convert the date column to a date type
df['stage_changed_since'] = pd.to_datetime(df['stage_changed_since'])

df = df[['date', 'stage', 'stage_changed_since']]

你得到了什么

        date     stage stage_changed_since
1 2010-10-10     basic                 NaT
2 2010-11-10     basic                 NaT
3 2010-12-10  advanced          2010-11-10

相关问题 更多 >