Pandas DataFrame如何在遍历DataFrame时对前一行的选择执行统计

2024-10-01 04:50:37 发布

您现在位置：Python中文网/ 问答频道 /正文

9973

网友

男 | 程序猿一只，喜欢编程写python代码。

我必须迭代一个日期时间索引的数据帧（是的，我知道迭代在pandas社区中被轻视）

我知道如何使用iterrows()进行迭代，但这似乎不允许我“回顾”前面的行

这是我的密码：

data = [
['2018-04-25 18:37:00',       5862,        4427],
['2018-04-25 21:36:30',       6421,        4581],
['2018-04-25 22:13:00',       5948,        4779],
['2018-04-26 00:11:30',       5703,        4314],
['2018-04-26 02:27:00',       4988,        3868],
['2018-04-26 04:28:30',       4812,        3823],
['2018-04-26 06:22:30',       4347,        3672],
['2018-04-26 10:50:30',       3896,        3546],
['2018-04-26 12:04:30',       3478,        3557],
['2018-04-26 14:02:30',       3625,        3598],
['2018-04-26 15:31:30',       3751,        3606]
]

df = pd.DataFrame(data, columns=['datetime', 'discharge1', 'discharge2'])
df['datetime'] = df['datetime'].apply(pd.to_datetime)
df = df.set_index('datetime')

then iterate over index, and values:

for i, v in df.iterrows():
    print(f"{i},{v}")

但是，我需要做两件事：

获取指定日期的整数位置（即行号）
对先前选择的行执行统计功能。为了简单起见，比方说，我想在迭代行时在列“A”中的前面的5行值中找到最大值

我想做的是这样的事情（伪代码）：

start_datetime='2018-04-26 00:11:30'
start_pos = df.get_index_position_for_datetime(start_datetime)

for i in range(start_pos, len(df)):
    value = df.iloc[i,'discharge1'] - get_average_over(df.iloc[i,'discharge2']:df.iloc[i-5,'discharge2'])

我该如何写这篇文章？在这种情况下，是否有可能（甚至有必要）使用矢量化

Tags： in pos df for data get datetime index

1条回答

网友
1楼 · 发布于 2024-10-01 04:50:37

使用DataFrame.rolling并应用平均值：
N = 5 df['value'] = df['discharge1'] - df['discharge2'].rolling(N).mean() location = df.index.get_loc(start_datetime) df.loc[df.index > start_datetime, 'value'] = np.nan

Pandas DataFrame如何在遍历DataFrame时对前一行的选择执行统计

相关问题更多 >

编程相关推荐

热门问题

热门文章

Pandas DataFrame如何在遍历DataFrame时对前一行的选择执行统计

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >