<p>函数<code>groupby</code>不是必需的,为了获得更好的性能,使用<a href="http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.duplicated.html" rel="nofollow noreferrer">^{<cd2>}</a>by multiple columns和参数<code>keep=False</code>获取所有重复,然后按<a href="http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#boolean-indexing" rel="nofollow noreferrer">^{<cd4>}</a>过滤:</p>
<pre><code>df = df[df.duplicated(['groups','ids'], keep=False)]
print (df)
groups ids numbers
0 group3 id4 89
1 group1 id1 50
2 group1 id1 30
6 group3 id4 90
</code></pre>
<p>如果需要排序,添加<a href="http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sort_values.html" rel="nofollow noreferrer">^{<cd5>}</a>和<a href="http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.reset_index.html" rel="nofollow noreferrer">^{<cd6>}</a>作为默认索引:</p>
<pre><code>df = (df[df.duplicated(['groups','ids'], keep=False)]
.sort_values(['groups','ids'])
.reset_index(drop=True))
print (df)
groups ids numbers
0 group1 id1 50
1 group1 id1 30
2 group3 id4 89
3 group3 id4 90
</code></pre>