<p>另一种方法</p>
<p>第一个groupby <code>BookID</code>和<a href="https://stackoverflow.com/a/53088007/4057186">list all authors per book (i.e. list all authors per group)</a></p>
<pre><code>combos = df.groupby('BookID').agg(lambda x: list(x)).reset_index(drop=False)
print(combos)
BookID Author
0 1 [John, Alex, Jenna]
1 2 [John, Alex]
2 3 [John]
3 4 [Alex, Mary, Max]
</code></pre>
<p>接下来,在<code>BookID</code>上与主数据合并,以获得每个作者的所有作者</p>
<pre><code>merged = combos.merge(df, how='inner', on='BookID')
print(merged)
BookID Author_x Author_y
0 1 [John, Alex, Jenna] John
1 1 [John, Alex, Jenna] Alex
2 1 [John, Alex, Jenna] Jenna
3 2 [John, Alex] John
4 2 [John, Alex] Alex
5 3 [John] John
6 4 [Alex, Mary, Max] Alex
7 4 [Alex, Mary, Max] Mary
8 4 [Alex, Mary, Max] Max
</code></pre>
<p><code>Author_x</code>是完整的作者列表,包括<code>Author_y</code>。现在可以使用以下方法将完整的作者列表(<code>Author_x</code>)与每个单独/唯一的作者(<code>Author_y</code>)进行比较</p>
<ol>
<li><a href="https://stackoverflow.com/a/3869503/4057186">Create dict whose keys are unique ^{<cd4>} values</a>(即唯一作者)和值是空列表</li>
<li>迭代dict中的每个键值对</li>
<li>使用<code>Author_y</code>列对上述步骤中的合并数据帧进行切片;这将在dict键中为author提供所有authors</li>
<li>从slice获取所有作者的列表(<code>Author_x</code>)作为扁平列表</li>
<li><a href="https://stackoverflow.com/a/252711/4057186">extend blank list</a>带<a href="https://stackoverflow.com/a/3462160/4057186">difference between flattened list (all authors) and dict key</a></li>
</ol>
<pre><code>d = {auth:[] for auth in df['Author'].unique()}
for k,v in d.items():
all_auths = merged[merged['Author_y']==k]['Author_x'].values.tolist()
auths = [coauths for nested in all_auths for coauths in nested]
v.extend(list(set(auths) - set([k])))
</code></pre>
<p>最后,放入<code>DataFrame</code>并计算每行的非空值</p>
<pre><code>cnames = ['coauth'+str(k) for k in range(1,len(d))]
df_summary = pd.DataFrame.from_dict(d, orient='index', columns=cnames)
df_summary['Num_Unique_CoAuthors'] = df_summary.shape[1] - df_summary.isna().sum(axis=1)
print(df_summary)
author coauth1 coauth2 coauth3 coauth4 Num_Unique_CoAuthors
0 John Alex Jenna None None 2
1 Alex Max John Mary Jenna 4
2 Jenna John Alex None None 2
3 Mary Max Alex None None 2
4 Max Alex Mary None None 2
</code></pre>
<p><strong>扩展数据案例</strong></p>
<p>如果主数据包含单个作者(即没有任何共同作者),则此方法为该行打印零</p>
<p>下面是添加到数据中的虚拟行,只有一个作者</p>
<pre><code>print(df)
BookID Author
0 1 John
1 1 Alex
2 1 Jenna
3 2 John
4 2 Alex
5 3 John
6 4 Alex
7 4 Mary
8 4 Max
9 5 Tom
</code></pre>
<p>这是输出</p>
<pre><code> author coauth1 coauth2 coauth3 coauth4 Num_Unique_CoAuthors
0 John Jenna Alex None None 2
1 Alex Mary John Jenna Max 4
2 Jenna John Alex None None 2
3 Mary Max Alex None None 2
4 Max Mary Alex None None 2
5 Tom None None None None 0
</code></pre>
<p><strong>初始答案</strong></p>
<p>你有没有用<code>sum</code>聚合来尝试<code>groupby</code></p>
<pre><code>df.groupby(['Author'])['BookID'].sum()
</code></pre>