<p>当使用超过2列或更多列的group by时,请记住将列名放入列表中:</p>
<pre><code>import pandas as pd
df = pd.DataFrame([
[32324, 342, "Feb-2019", 5, "A"],
[34345, 293, "Feb-2019", 5, "A"],
[45453, 212, "Feb-2019", 3, "A"],
[34343, 453, "Feb-2019", 3, "A"],
[53533, 112, "Feb-2019", 5, "B"],
[12334, 511, "Feb-2019", 5, "B"],
[99934, 123, "Feb-2019", 3, "B"],
[21213, 534, "Feb-2019", 3, "B"]
],
columns=["customer_id", "monthly_spending", "month", "monthtly_purchases", "region"]
)
d = {'customer_id': ['count'], 'monthly_spending': ['sum']}
agg_df = df.groupby(["monthtly_purchases", "region"]).agg(d)
print(agg_df)
</code></pre>
<p>返回:</p>
<pre><code> customer_id monthly_spending
count sum
monthtly_purchases region
3 A 2 665
B 2 657
5 A 2 635
B 2 623
</code></pre>
<p>按照注释中的要求,明确多索引(通过创建新索引将其拆分为列):</p>
<pre><code>agg_df.reset_index(inplace=True)
print(agg_df)
</code></pre>
<p>返回:</p>
<pre><code> monthtly_purchases region customer_id monthly_spending
count sum
0 3 A 2 665
1 3 B 2 657
2 5 A 2 635
3 5 B 2 623
</code></pre>
<p>包括评论中要求的月份:</p>
<pre><code>agg_df = df.groupby(["month", "monthtly_purchases", "region"], as_index=False).agg(d)
</code></pre>
<p>返回:</p>
<pre><code> month monthtly_purchases region customer_id monthly_spending
count sum
0 Feb-2019 3 A 2 665
1 Feb-2019 3 B 2 657
2 Feb-2019 5 A 2 635
3 Feb-2019 5 B 2 623
4 March-2019 3 A 2 666
5 March-2019 3 B 2 858
6 March-2019 5 A 2 596
7 March-2019 5 B 2 577
</code></pre>