擅长:python、mysql、java
<p>这里有一种方法,使用groupby两次</p>
<pre><code>df1['tmp'] = (df1.Number - df1.Number.shift() > 1).cumsum()
df1.groupby(['ID', 'tmp']).Number.count().groupby(level = 0).mean().reset_index(name = 'avg_length')
2.29 ms ± 75.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
ID avg_length
0 400 3
1 500 2
</code></pre>
<p>选项2:不使用apply两次,仍然使用前面创建的tmp列</p>
<pre><code>df1.groupby('ID').tmp.apply(lambda x: x.value_counts().mean()).reset_index(name = 'avg_length')
2.25 ms ± 99.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
</code></pre>