是否在重采样操作后初始化第一行和最后一行的值?

2024-09-28 22:42:35 发布

您现在位置:Python中文网/ 问答频道 /正文

例如,给定一个带有1hPeriod的数据帧,我想设置0&;1当新的5hPeriod分别开始和结束时,新列中的值

让我们考虑这个输入数据,例如:

import pandas as pd
from random import seed, randint
from collections import OrderedDict

p1h = pd.period_range(start='2020-02-01 00:00', end='2020-03-04 00:00', freq='1h', name='p1h')

seed(1)
values = [randint(0,10) for p in p1h]
df = pd.DataFrame({'Values' : values}, index=p1h)

结果

df.head(10)

                  Values
p1h                     
2020-02-01 00:00       2
2020-02-01 01:00       9
2020-02-01 02:00       1
2020-02-01 03:00       4
2020-02-01 04:00       1
2020-02-01 05:00       7
2020-02-01 06:00       7
2020-02-01 07:00       7
2020-02-01 08:00      10
2020-02-01 09:00       6

有没有办法设置一个新的列以获得以下结果? (每个时段的第一行和最后一行分别用0和1初始化)

df['period5h'] = df.resample('5h').???

df.head(10)

                  Values   period5h
p1h                     
2020-02-01 00:00       2          0   <- 1st row of 5h period
2020-02-01 01:00       9
2020-02-01 02:00       1
2020-02-01 03:00       4
2020-02-01 04:00       1          1   <- last row of 5h period
2020-02-01 05:00       7          0   <- 1st row of 5h period
2020-02-01 06:00       7
2020-02-01 07:00       7
2020-02-01 08:00      10
2020-02-01 09:00       6          1   <- last row of 5h period

请问,这可以通过熊猫的一些功能来实现吗

最终目标是通过0和1之间的线性插值来填充空值,以便获得相对于5h周期的当前行的%进度

另一轨道/问题

另一种方法是使用5hPeriodIndex初始化第二个数据帧,将新列的值初始化为1,然后将PeriodIndex向上采样回1H以合并两个数据帧

移位(-1)将初始化时段的最后一行

我将重复该过程,而不对值0进行移位

那么,如何创建这个新的数据帧,以便将其合并到第一个数据帧?我尝试了一些合并命令,但有一个错误指出两个索引的频率不同

谢谢你的帮助!胜过


Tags: of数据fromimportdfheadperiodrow
3条回答

虽然不是最具python风格的方法,但它很有效

import pandas as pd
from random import seed, randint
from collections import OrderedDict
import time
p1h = pd.period_range(start='2020-02-01 00:00', end='2040-03-04 00:00', freq='1h', name='p1h')

seed(1)
values = [randint(0,10) for p in p1h]
df = pd.DataFrame({'Values' : values}, index=p1h)

t1 = time.time()
for i in range(len(df['Values'])):
  if (i+1)% 5 == 1:
    df['Values'].iloc[i] = 0
  elif (i+1) % 5 == 0:
    df['Values'].iloc[i] = 1
t2 = time.time()
df.head(20)

print(t2-t1)


时间:8.770591259002686

方法2:

import pandas as pd
from random import seed, randint
from collections import OrderedDict
import time
p1h = pd.period_range(start='2020-02-01 00:00', end='2040-03-04 00:00', freq='1h', name='p1h')

seed(1)
values = [randint(0,10) for p in p1h]
df = pd.DataFrame({'Values' : values}, index=p1h)

t1 = time.time()

df['Values'].iloc[range(0,len(df['Values']),5)] = 0
df['Values'].iloc[range(4,len(df['Values']),5)] = 1
t2 = time.time()
df.head(20)

print(t2-t1)

时间:0.009400367736816406

好的,我最终设置为使用以下方法,该方法相当快(无循环)

 super_pi = pd.period_range(start='2020-01-01 00:00', end='2020-06-01 00:00', freq='5h', name='p5h')
 super_df = pd.DataFrame({'End' : 1, 'Start' : 0}, index=super_pi).resample('1h').first()
 # We know last row is a 1 (end of period)
 super_df['End'] = super_df['End'].shift(-1, fill_value=1)
 super_df['Period'] = super_df[['End','Start']].sum(axis=1, min_count=1)

结果

 supder_df.head(10)

                   End  Start  Period
 p5h                                 
 2020-01-01 00:00  NaN    0.0     0.0
 2020-01-01 01:00  NaN    NaN     NaN
 2020-01-01 02:00  NaN    NaN     NaN
 2020-01-01 03:00  NaN    NaN     NaN
 2020-01-01 04:00  1.0    NaN     1.0
 2020-01-01 05:00  NaN    0.0     0.0
 2020-01-01 06:00  NaN    NaN     NaN
 2020-01-01 07:00  NaN    NaN     NaN
 2020-01-01 08:00  NaN    NaN     NaN

最好的

使用重采样对象的indices属性查找组的第一个和最后一个索引。即使数据没有固定的频率,或者没有完全划分重采样频率的频率,这也会起作用。组将只有一个度量get设置为1,而不是0。然后我们相应地设置值

i1 = [] # Last `.iloc` index labels
i0 = [] # First `.iloc` index labels
for k,v in df.resample('5H').indices.items():
    i0.append(v[0])
    i1.append(v[-1])

df.loc[df.index[i0], 'period_5H'] = 0
df.loc[df.index[i1], 'period_5H'] = 1

                  Values  period_5H
p1h                                
2020-02-01 00:00       2        0.0
2020-02-01 01:00       9        NaN
2020-02-01 02:00       1        NaN
2020-02-01 03:00       4        NaN
2020-02-01 04:00       1        1.0
2020-02-01 05:00       7        0.0
2020-02-01 06:00       7        NaN
2020-02-01 07:00       7        NaN
2020-02-01 08:00      10        NaN
2020-02-01 09:00       6        1.0
2020-02-01 10:00       3        0.0
...

相关问题 更多 >