是否在重采样操作后初始化第一行和最后一行的值？

import pandas as pd from random import seed, randint from collections import OrderedDict p1h = pd.period_range(start='2020-02-01 00:00', end='2020-03-04 00:00', freq='1h', name='p1h') seed(1) values = [randint(0,10) for p in p1h] df = pd.DataFrame({'Values' : values}, index=p1h)

df.head(10) Values p1h 2020-02-01 00:00 2 2020-02-01 01:00 9 2020-02-01 02:00 1 2020-02-01 03:00 4 2020-02-01 04:00 1 2020-02-01 05:00 7 2020-02-01 06:00 7 2020-02-01 07:00 7 2020-02-01 08:00 10 2020-02-01 09:00 6

df['period5h'] = df.resample('5h').??? df.head(10) Values period5h p1h 2020-02-01 00:00 2 0 <- 1st row of 5h period 2020-02-01 01:00 9 2020-02-01 02:00 1 2020-02-01 03:00 4 2020-02-01 04:00 1 1 <- last row of 5h period 2020-02-01 05:00 7 0 <- 1st row of 5h period 2020-02-01 06:00 7 2020-02-01 07:00 7 2020-02-01 08:00 10 2020-02-01 09:00 6 1 <- last row of 5h period

另一轨道/问题

另一种方法是使用5hPeriodIndex初始化第二个数据帧，将新列的值初始化为1，然后将PeriodIndex向上采样回1H以合并两个数据帧

移位（-1）将初始化时段的最后一行

我将重复该过程，而不对值0进行移位

那么，如何创建这个新的数据帧，以便将其合并到第一个数据帧？我尝试了一些合并命令，但有一个错误指出两个索引的频率不同

谢谢你的帮助！胜过

3条回答

网友

1楼 · 编辑于 2024-09-28 22:42:35

虽然不是最具python风格的方法，但它很有效

import pandas as pd
from random import seed, randint
from collections import OrderedDict
import time
p1h = pd.period_range(start='2020-02-01 00:00', end='2040-03-04 00:00', freq='1h', name='p1h')

seed(1)
values = [randint(0,10) for p in p1h]
df = pd.DataFrame({'Values' : values}, index=p1h)

t1 = time.time()
for i in range(len(df['Values'])):
  if (i+1)% 5 == 1:
    df['Values'].iloc[i] = 0
  elif (i+1) % 5 == 0:
    df['Values'].iloc[i] = 1
t2 = time.time()
df.head(20)

print(t2-t1)

时间：8.770591259002686

方法2：

import pandas as pd
from random import seed, randint
from collections import OrderedDict
import time
p1h = pd.period_range(start='2020-02-01 00:00', end='2040-03-04 00:00', freq='1h', name='p1h')

seed(1)
values = [randint(0,10) for p in p1h]
df = pd.DataFrame({'Values' : values}, index=p1h)

t1 = time.time()

df['Values'].iloc[range(0,len(df['Values']),5)] = 0
df['Values'].iloc[range(4,len(df['Values']),5)] = 1
t2 = time.time()
df.head(20)

print(t2-t1)

时间：0.009400367736816406

网友

2楼 · 编辑于 2024-09-28 22:42:35

好的，我最终设置为使用以下方法，该方法相当快（无循环）

 super_pi = pd.period_range(start='2020-01-01 00:00', end='2020-06-01 00:00', freq='5h', name='p5h')
 super_df = pd.DataFrame({'End' : 1, 'Start' : 0}, index=super_pi).resample('1h').first()
 # We know last row is a 1 (end of period)
 super_df['End'] = super_df['End'].shift(-1, fill_value=1)
 super_df['Period'] = super_df[['End','Start']].sum(axis=1, min_count=1)

结果

 supder_df.head(10)

                   End  Start  Period
 p5h                                 
 2020-01-01 00:00  NaN    0.0     0.0
 2020-01-01 01:00  NaN    NaN     NaN
 2020-01-01 02:00  NaN    NaN     NaN
 2020-01-01 03:00  NaN    NaN     NaN
 2020-01-01 04:00  1.0    NaN     1.0
 2020-01-01 05:00  NaN    0.0     0.0
 2020-01-01 06:00  NaN    NaN     NaN
 2020-01-01 07:00  NaN    NaN     NaN
 2020-01-01 08:00  NaN    NaN     NaN

最好的

网友

3楼 · 编辑于 2024-09-28 22:42:35

使用重采样对象的indices属性查找组的第一个和最后一个索引。即使数据没有固定的频率，或者没有完全划分重采样频率的频率，这也会起作用。组将只有一个度量get设置为1，而不是0。然后我们相应地设置值

i1 = [] # Last `.iloc` index labels
i0 = [] # First `.iloc` index labels
for k,v in df.resample('5H').indices.items():
    i0.append(v[0])
    i1.append(v[-1])

df.loc[df.index[i0], 'period_5H'] = 0
df.loc[df.index[i1], 'period_5H'] = 1

                  Values  period_5H
p1h                                
2020-02-01 00:00       2        0.0
2020-02-01 01:00       9        NaN
2020-02-01 02:00       1        NaN
2020-02-01 03:00       4        NaN
2020-02-01 04:00       1        1.0
2020-02-01 05:00       7        0.0
2020-02-01 06:00       7        NaN
2020-02-01 07:00       7        NaN
2020-02-01 08:00      10        NaN
2020-02-01 09:00       6        1.0
2020-02-01 10:00       3        0.0
...

另一轨道/问题

相关问题更多 >

编程相关推荐

热门问题

热门文章