Pandas：如何计算组内的条件滚动/累计最大值

Index close bool condrolmax 0 1 True 1 1 3 True 3 2 2 True 3 3 5 True 5 4 3 False 5 5 3 True 3 --> rolling/accumulative maximum reset (False cond above) 6 4 True 4 7 5 False 4 8 7 False 4 9 5 True 5 --> rolling/accumulative maximum reset (False cond above) 10 7 False 5 11 8 False 5 12 6 True 6 --> rolling/accumulative maximum reset (False cond above) 13 8 True 8 14 5 False 8 15 5 True 5 --> rolling/accumulative maximum reset (False cond above) 16 7 True 7 17 15 True 15 18 16 True 16

# initialise data of lists. data = {'close':[1,3,2,5,3,3,4,5,7,5,7,8,6,8,5,5,7,15,16], 'bool':[True, True, True, True, False, True, True, False, False, True, False, False, True, True, False, True, True, True, True], 'condrolmax': [1,3,3,5,5,3,4,4,4,5,5,5,6,8,8,5,7,15,16]} # Create DataFrame df = pd.DataFrame(data)

3条回答

网友

1楼 · 编辑于 2024-09-30 20:29:09

我不确定我们如何使用线性代数和矢量化来加快这个函数的速度，但是使用列表理解，我们编写了一个更快的算法。首先，将函数定义为：

def faster_condrolmax(df):
    df['cond_index'] = [df.index[i] if df['bool'][i]==False else 0 for i in 
    df.index]
    df['cond_comp_index'] = [np.max(df.cond_index[0:i]) for i in df.index]
    df['cond_comp_index'] = df['cond_comp_index'].fillna(0).astype(int)
    df['condrolmax'] = np.zeros(len(df.close))
    df['condrolmax'] = [np.max(df.close[df.cond_comp_index[i]:i]) if 
               df.cond_comp_index[i]<i else df.close[i] for 
               i in range(len(df.close))]
    return df

然后，您可以使用：

!pip install line_profiler
%load_ext line_profiler

要添加和加载line profiler并查看每行代码需要多长时间，请执行以下操作：

%lprun -f faster_condrolmax faster_condrolmax(df)

这将导致： Each line profiling results

当然，看看整个功能需要多长时间：

%timeit faster_condrolmax(df)

这将导致： Total algorithm profiling result

如果你使用SeaBean函数，你可以得到更好的结果，速度是我建议的函数的一半。然而，SeaBean的估计速度似乎并不稳定，要估计他的函数，您应该在更大的数据集上运行它，然后再决定。这都是因为%timeit报告如下： SeaBean's function profiling result

网友

2楼 · 编辑于 2024-09-30 20:29:09

首先使用您的条件（bool从False变为True）和cumsum创建组，然后在groupby之后应用rolling：

group = (df['bool']&(~df['bool']).shift()).cumsum()
df.groupby(group)['close'].rolling(2, min_periods=1).max()

输出：

0     0      1.0
      1      3.0
      2      3.0
      3      5.0
      4      5.0
1     5      3.0
      6      4.0
      7      5.0
      8      7.0
2     9      5.0
      10     7.0
      11     8.0
3     12     6.0
      13     8.0
      14     8.0
4     15     5.0
      16     7.0
      17    15.0
      18    16.0
Name: close, dtype: float64

要作为列插入，请执行以下操作：

df['condrolmax'] = df.groupby(group)['close'].rolling(2, min_periods=1).max().droplevel(0)

输出：

    close   bool  condrolmax
0       1   True         1.0
1       3   True         3.0
2       2   True         3.0
3       5   True         5.0
4       3  False         5.0
5       3   True         3.0
6       4   True         4.0
7       5  False         5.0
8       7  False         7.0
9       5   True         5.0
10      7  False         7.0
11      8  False         8.0
12      6   True         6.0
13      8   True         8.0
14      5  False         8.0
15      5   True         5.0
16      7   True         7.0
17     15   True        15.0
18     16   True        16.0

NB。如果希望滚动中包括边界，请在rolling中使用min_periods=1

网友

3楼 · 编辑于 2024-09-30 20:29:09

您可以设置组，然后使用^{}，如下所示：

# Set group: New group if current row `bool` is True and last row `bool` is False
g = (df['bool'] & (~df['bool']).shift()).cumsum()   

# Get cumulative max of column `close` within the group 
df['condrolmax'] = df.groupby(g)['close'].cummax()

结果：

print(df)

    close   bool  condrolmax
0       1   True           1
1       3   True           3
2       2   True           3
3       5   True           5
4       3  False           5
5       3   True           3
6       4   True           4
7       5  False           5
8       7  False           7
9       5   True           5
10      7  False           7
11      8  False           8
12      6   True           6
13      8   True           8
14      5  False           8
15      5   True           5
16      7   True           7
17     15   True          15
18     16   True          16

相关问题更多 >

编程相关推荐

热门问题

热门文章