擅长:python、mysql、java
<p>首先为每个组创建<code>set</code>到新列,然后获得与<code>Author</code>列的差异,通过<a href="http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#boolean-indexing" rel="nofollow noreferrer">^{<cd3>}</a>移除空集,最后将值展平到新集以获得唯一,最后获取长度:</p>
<pre><code>df = df.join(df.groupby('BookID')['Author'].apply(set).rename('new'), 'BookID')
df['new'] = [b - set([a]) for a, b in zip(df['Author'], df['new'])]
df = (df[df['new'].astype(bool)].groupby('Author')['new']
.apply(lambda x: tuple(set([z for y in x for z in y])))
.to_frame())
df.insert(0, 'Num_Unique_CoAuthors', df['new'].str.len())
print (df)
Num_Unique_CoAuthors new
Author
Alex 4 (Max, John, Jenna, Mary)
Jenna 2 (John, Alex)
John 2 (Jenna, Alex)
Mary 2 (Max, Alex)
Max 2 (Mary, Alex)
</code></pre>