在数据帧中滚动10分钟最近值

2024-09-27 23:15:49 发布

您现在位置:Python中文网/ 问答频道 /正文

我希望根据给定的列值创建一个新列。 “CurrentValue”列的每一行应等于“InitialValue”列最近10分钟内的最新值

以下是数据集(csv格式):

date,InitialValue
3/20/2020 1:00,
3/20/2020 1:01,
3/20/2020 1:02,
3/20/2020 1:03,
3/20/2020 1:04,
3/20/2020 1:05,
3/20/2020 1:07,
3/20/2020 1:12,
3/20/2020 1:13,
3/20/2020 1:15,
3/20/2020 1:16,555
3/20/2020 1:17,
3/20/2020 1:19,
3/20/2020 1:20,
3/20/2020 1:22,
3/20/2020 1:26,576
3/20/2020 1:27,
3/20/2020 1:28,
3/20/2020 1:34,
3/20/2020 1:35,
3/20/2020 1:36,
3/20/2020 1:37,
3/20/2020 1:38,577
3/20/2020 1:40,
3/20/2020 1:42,
3/20/2020 1:43,
3/20/2020 1:44,
3/20/2020 1:45,
3/20/2020 1:51,

以下是示例输出:

date,InitialValue,CurrentValue
2020-03-20 01:00:00,,
2020-03-20 01:01:00,,
2020-03-20 01:02:00,,
2020-03-20 01:03:00,,
2020-03-20 01:04:00,,
2020-03-20 01:05:00,,
2020-03-20 01:07:00,,
2020-03-20 01:12:00,,
2020-03-20 01:13:00,,
2020-03-20 01:15:00,,
2020-03-20 01:16:00,555.0,555.0
2020-03-20 01:17:00,,555.0
2020-03-20 01:19:00,,555.0
2020-03-20 01:20:00,,555.0
2020-03-20 01:22:00,,555.0
2020-03-20 01:26:00,576.0,576.0
2020-03-20 01:27:00,,576.0
2020-03-20 01:28:00,,576.0
2020-03-20 01:34:00,,576.0
2020-03-20 01:35:00,,576.0
2020-03-20 01:36:00,,576.0
2020-03-20 01:37:00,,
2020-03-20 01:38:00,577.0,577.0
2020-03-20 01:40:00,,577.0
2020-03-20 01:42:00,,577.0
2020-03-20 01:43:00,,577.0
2020-03-20 01:44:00,,577.0
2020-03-20 01:45:00,,577.0
2020-03-20 01:51:00,,

更新:这不是正确答案Pandas - Using 'ffill' on values other than Na

更新2:输出数据更新


Tags: csv数据答案示例pandasdateon格式
2条回答

我假设df['date']是datetime类型。如果是字符串,首先通过

df['date'] = pd.to_datetime(df['date'])

解决方案1(较短):

使用带有10分钟偏移量的pd.DataFrame.rolling

df = df.set_index('date')
df['CurrentValue'] = df.rolling('10min',closed='both')['InitialValue'].apply(lambda x: x.ffill()[-1])

解决方案2(更快):

查找每行最后一次观察的日期和值

# get date of last observation
lastDate = df['date'].mask(pd.isnull(df['InitialValue']))
lastDate = lastDate.ffill()
    
# fill latest observation into CurrentValue if lastDate is less than 600s old
seconds_since_last = (df['date'] - lastDate).dt.total_seconds()
df['CurrentValue'] = df['InitialValue'].ffill().mask(seconds_since_last > 600)
import pandas as pd
import datetime
import numpy as np

df = pd.read_csv('filename.csv')
df['CurrentValue']=np.NaN

df['date'] = pd.to_datetime(df['date'])
df = df.set_index('date')
ten_minutes = datetime.timedelta(minutes=10)

for row in df.iterrows():
    df_timed = df[row[0]-ten_minutes: row[0]]
    for k in df_timed.iloc[::-1].iterrows():
        if not pd.isnull(k[1]['InitialValue']):
            df.at[row[0],'CurrentValue'] = k[1]['InitialValue']
            break

相关问题 更多 >

    热门问题