<p>我修改了你的数据集,这样就有两个这样的组。一个有2行从<code>N</code>到<code>Y</code>,另一个有8行从<code>N</code>到{<cd2>}。这取决于您是否在<code>y</code>包含行中计数。否则,它将有两个组,一个包含1行,另一个包含7行。看起来你没有时间序列列,所以我想这意味着这些行在时间上是均匀分布的。在</p>
<pre><code>In [25]:
df=pd.read_clipboard()
print df
playerID yearid votedBy ballots needed votes inducted category needed_note
3741 abbotji01 2005 BBWAA 516 387 13 N Player NaN
2860 aaronha01 1982 BBWAA 415 312 406 Y Player NaN
3743 abbotji01 2005 BBWAA 516 387 13 N Player NaN
146 adamsba01 1937 BBWAA 201 151 8 N Player NaN
259 adamsba01 1938 BBWAA 262 197 11 N Player NaN
384 adamsba01 1939 BBWAA 274 206 11 N Player NaN
497 adamsba01 1942 BBWAA 233 175 11 N Player NaN
574 adamsba01 1945 BBWAA 247 186 7 N Player NaN
2108 adamsbo03 1966 BBWAA 302 227 1 N Player NaN
2861 aaronha01 1982 BBWAA 415 312 406 Y Player NaN
In [26]:
df['isY']=(df.inducted=='Y')
df['isY']=np.hstack((0,df['isY'].cumsum().values[:-1])).T
In [27]:
print df.groupby('isY').count()
playerID yearid votedBy ballots needed votes inducted category needed_note isY
0 2 2 2 2 2 2 2 2 0 2
1 8 8 8 8 8 8 8 8 0 8
[2 rows x 10 columns]
</code></pre>
<p>假设不计算<code>Y</code>,则平均值可以通过以下公式计算:</p>
^{pr2}$