<p>其他一些解决方案:</p>
<p>对于聚合和,仅筛选列<code>PWGTP</code>,如果有更多数字列:</p>
<pre><code>pov_rate = (df[df['POV'] <= 100].groupby('AgeGroups')['PWGTP'].sum() /
df.groupby('AgeGroups')['PWGTP'].sum())
print (pov_rate)
</code></pre>
<p>只有一个<code>groupby</code>带有辅助列<code>filt</code>:</p>
<pre><code>pov_rate = (df.assign(filt = df['PWGTP'].where(df['POV'] <= 100))
.groupby('AgeGroups')[['filt','PWGTP']].sum()
.eval('filt / PWGTP'))
print (pov_rate)
</code></pre>
<p><strong>性能取决于组的数量、匹配行的数量、数字列的数量和数据帧的长度,因此在实际数据中应该有所不同</p>
<pre><code>np.random.seed(2020)
N = 1000000
df = pd.DataFrame({'AgeGroups':np.random.randint(10000,size=N),
'POV': np.random.randint(50, 500, size=N),
'PWGTP':np.random.randint(100,size=N),
'a':np.random.randint(100,size=N),
'b':np.random.randint(100,size=N),
'c':np.random.randint(100,size=N)})
# print (df)
</code></pre>
<hr/>
<pre><code>In [13]: %%timeit
...: pov_rate = (df[df['POV'] <= 100].groupby('AgeGroups').sum()['PWGTP'] /
...: df.groupby('AgeGroups').sum()['PWGTP'])
...:
209 ms ± 7.97 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [14]: %%timeit
...: pov_rate = (df[df['POV'] <= 100].groupby('AgeGroups')['PWGTP'].sum() /
...: df.groupby('AgeGroups')['PWGTP'].sum())
...:
85.8 ms ± 332 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [15]: %%timeit
...: pov_rate = (df.assign(filt = df['PWGTP'].where(df['POV'] <= 100))
...: .groupby('AgeGroups')[['filt','PWGTP']].sum()
...: .eval('filt / PWGTP'))
...:
122 ms ± 388 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
</code></pre>