擅长:python、mysql、java
<p>我将使用<code>groupby</code>并在需要时进行更新,然后取消堆栈:</p>
<pre><code># replace count with nunique if necessary
new_df = df.groupby(['ref','Sam'])['Class'].agg({'count','first'})
new_df.loc[new_df['count'].gt(1), 'first'] = 'MultiHit'
new_df['first'].unstack('Sam')
</code></pre>
<p>输出:</p>
<pre><code>Sam 1 2
ref
A MultiHit v1
B v1 MultiHit
C v2 v1
</code></pre>
<hr/>
<p>也可以使用Pivot,无需稍后取消堆叠:</p>
<pre><code>new_df = df.pivot_table(index='ref',
columns='Sam',
values='Class',
aggfunc=['count', 'first'])
new_df.loc[:,'first'] = np.where(new_df.loc[:,'count'].gt(1),
'MultiHist',
new_df.loc[:,'first'])
new_df.loc[:,'first']
</code></pre>
<p>也给出了相同的输出</p>