Python/Pandas按标准分组的最佳方式？问题的回答

Python/Pandas按标准分组的最佳方式？

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

编辑。正如保罗在评论中提到的，有一个<code>pd.cut</code>函数，它比我最初的答案优雅得多。在 <pre><code># equal-width bins df['inc_index'] = pd.cut(df.A, bins=4, labels=[1, 2, 3, 4]) # custom bin edges df['inc_index'] = pd.cut(df.A, bins=[0, 20000, 30000, 40000, 50000], labels=[1, 2, 3, 4]) </code></pre> 请注意，<code>labels</code>参数是可选的。<code>pd.cut</code>生成一个<a href="http://pandas.pydata.org/pandas-docs/stable/categorical.html" rel="nofollow noreferrer">ordered categorical ^{<cd4>}</a>，因此您可以根据结果列进行排序，而不考虑标签： ^{pr2}$ 输出（模随机数） <pre class="lang-none prettyprint-override"><code> A B inc_index 6 2 16 (0, 7] 7 5 5 (0, 7] 3 12 6 (7, 13] 4 10 8 (7, 13] 5 9 13 (7, 13] 1 15 10 (13, 15] 2 15 7 (13, 15] 8 15 13 (13, 15] 0 18 10 (15, 20] 9 16 12 (15, 20] </code></pre> <hr/> 原始解。这是对<a href="https://stackoverflow.com/a/36345213/1391671">Alexander's answer</a>变桶宽的推广。您可以使用<code>Series.apply</code>构建<code>inc_index</code>列。例如 <pre><code>def bucket(v): # of course, the thresholds can be arbitrary if v < 20000: return 1 if v < 30000: return 2 if v < 40000: return 3 return 4 df['inc_index'] = df.mn_earn_wne_p6.apply(bucket) </code></pre> 或者，如果你真的想避免<code>def</code> <pre><code>df['inc_index'] = df.mn_earn_wne_p6.apply( lambda v: 1 if v < 20000 else 2 if v < 30000 else 3 if v < 40000 else 4) </code></pre> 请注意，如果您只想将<code>mn_earn_wne_p6</code>的范围细分为相等的桶，那么Alexander的方法更干净、更快。在 <pre><code>df['inc_index'] = df.mn_earn_wne_p6 // bucket_width </code></pre> 然后，为了得到您想要的结果，您可以按此列进行排序。在 <pre><code>df.sort_values('inc_index') </code></pre> 您还可以<code>groupby('inc_index')</code>在每个bucket中聚合结果。在

Python/Pandas按标准分组的最佳方式？

1 个回答

相关Python问题