擅长:python、mysql、java
<p>请注意,在pandas 23中,在gropby agg中使用dictionary是不推荐的,将来将被删除,因此我们不能使用该方法。在</p>
<h2>警告</h2>
<pre><code>df = (not_cancelled.groupby(['year','month','day'])['arr_delay']
.agg({'arr_delay': 'mean', 'arr_delay_2': mean_pos})
)
FutureWarning: using a dict on a Series for aggregation
is deprecated and will be removed in a future version.
</code></pre>
<p>所以,为了解决这个问题,我想出了另一个主意。在</p>
<p>创建一个新列,使所有非正值为nan,然后执行常规的groupby。在</p>
^{pr2}$
<p>它提供:</p>
<pre><code> year month day arr_delay arr_delay_positive
0 2013 1 1 12.651023 32.481562
1 2013 1 2 12.692888 32.029907
2 2013 1 3 5.733333 27.660870
3 2013 1 4 -1.932819 28.309764
4 2013 1 5 -1.525802 22.558824
</code></pre>
<h2>健全性检查</h2>
<pre><code># sanity check
a = not_cancelled.query(""" year==2013 & month ==1 & day ==1 """)['arr_delay']
a = a[a>0]
a.mean() # 32.48156182212581
</code></pre>