循环遍历数据帧并根据条件复制到新的数据帧

2024-09-28 18:57:11 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个数据帧df,包含6000多行数据,日期时间索引的形式为YYYY-MM-DD,列为IDwater_levelchange

我想:

  1. 循环遍历列change中的每个值并确定转折点
  2. 当我找到一个转折点时,将包括索引在内的整行数据复制到一个新的数据帧中,例如turningpoints_df
  3. 对于循环中标识的每个新转折点,将该行数据添加到我的新dataframeturningpoints_df,这样我就可以得到如下结果:
               ID    water_level    change
date           
2000-10-01      2         5.5        -0.01
2000-12-13     40        10.0         0.02
2001-02-10    150         1.1       -0.005
2001-07-29    201        12.4         0.01
...           ...         ...          ...

我在考虑采取一种定位方法,例如(纯粹是说明性的):

turningpoints_df = pd.DataFrame(columns = ['ID', 'water_level', 'change'])

for i in range(len(df['change'])):
    if [i-1] < 0 and [i+1] > 0:
        #this is a min point and take this row and copy to turningpoints_df
    elif [i-1] > 0 and [i+1] < 0:
        #this is a max point and take this row and copy to turningpoints_df
    else: 
        pass 

我的问题是,我不确定如何根据前后的值检查我的change列中的每个值,然后在满足条件时如何将该行数据提取到新的df中


Tags: and数据iddfisthischangelevel
2条回答

使用一些NumPy特性,允许您向前或向后roll()一系列。然后将prevnext放在同一行上,这样就可以使用一个简单的函数来apply()您的逻辑,因为所有内容都在同一行上

from decimal import *
import numpy as np
d = list(pd.date_range(dt.datetime(2000,1,1), dt.datetime(2010,12,31)))
df = pd.DataFrame({"date":d, "ID":[random.randint(1,200) for x in d], 
     "water_level":[round(Decimal(random.uniform(1,13)),2) for x in d], 
      "change":[round(Decimal(random.uniform(-0.05, 0.05)),3) for x in d]})

# have ref to prev and next, just apply logic
def turningpoint(r):
    r["turningpoint"] = (r["prev_change"] < 0 and r["next_change"] > 0) or \
        (r["prev_change"] > 0 and r["next_change"] < 0)
    return r

# use numpy to shift "change" so have prev and next on same row as new columns
# initially default turningpoint boolean
df = df.assign(prev_change=np.roll(df["change"],1), 
          next_change=np.roll(df["change"],-1),
          turningpoint=False).apply(turningpoint, axis=1).drop(["prev_change", "next_change"], axis=1)
# first and last rows cannot be turning points
df.loc[0:0,"turningpoint"] = False
df.loc[df.index[-1], "turningpoint"] = False

# take a copy of all rows that are turningpoints into new df with index
df_turningpoint = df[df["turningpoint"]].copy()
df_turningpoint

听起来您想使用DataFrame的shift方法

#  shift values down by 1:

df[change_down] = df[change].shift(1)


#  shift values up by 1:
df[change_up] = df[change].shift(-1)

然后,您应该能够比较每一行的值,并继续进行您试图实现的任何操作

for row in df.iterrows():
   *check conditions here*

相关问题 更多 >