<p>你的问题让我想起了<code>lag</code>操作和<code>cumsum</code>。这里有一个答案。如果您的数据很大,我认为使用python <code>list</code>和<code>tuple</code>是可以的,默认模块必须有函数来完成我们的任务</p>
<h2>步骤1:获取数据</h2>
<pre><code># generate data
import pandas as pd
import numpy as np
from random import choices,seed
seed(1245)
data = pd.DataFrame(list(zip(sorted(choices(range(0,10), k=20,)), choices(range(20,29), k=20))), columns=['a','b'])
</code></pre>
<h2>第2步:滞后1个长度</h2>
<pre><code># lag opertion
data_shift = data.shift(1,fill_value = -999)
data_shift.columns = ["a_last","b_last"]
# conbine them together to apply. If your data is huge, just call function on these 2 pieces of data
data_flat = pd.concat([data,data_shift],axis = 1)
data_flat.head()
</code></pre>
<p>输出:</p>
<pre><code> a b a_last b_last
0 1 26 -999 -999
1 1 27 1 26
2 1 28 1 27
3 2 22 1 28
4 2 24 2 22
</code></pre>
<h2>步骤3:定义custum函数,然后将观察结果分组</h2>
<pre><code># define your function with args m,n
def your_func(x,m,n):
cond1 = (abs(x.a - x.a_last) <= m)
cond2 = (abs(x.b - x.b_last) <= n)
if cond1 & cond2:
return 0
else:
return 1
# calculate per row and get the group_id of samples
groups = data_flat.apply(your_func,axis = 1,m=1,n=1).cumsum()
# get the result
data.groupby(groups).apply(lambda x:list(map(tuple,x.values)))
</code></pre>
<p>输出:</p>
<pre><code>1 [(1, 26), (1, 27), (1, 28)]
2 [(2, 22)]
3 [(2, 24)]
4 [(3, 20)]
5 [(3, 26)]
6 [(4, 21), (4, 20)]
7 [(5, 28)]
8 [(5, 26), (5, 26)]
9 [(6, 28)]
10 [(6, 24)]
11 [(6, 28)]
12 [(7, 23)]
13 [(7, 26)]
14 [(8, 28), (8, 28)]
15 [(9, 26)]
dtype: object
</code></pre>