<p>在<code>groupby+apply</code>中使用筛选:</p>
<pre><code>idx = set(all_cites_dog['Dog_Number'])
all_cites_dog = (all_cites_dog.groupby('Dog_Number')['Cites_Dogs']
.apply(lambda x: list([y for y in x if y in idx])))
print (all_cites_dog)
Dog_Number
DOG123 [DOG127]
DOG126 []
DOG127 [DOG123]
Name: Cites_Dogs, dtype: object
</code></pre>
<p>为了获得更好的性能,首先按<a href="http://pandas.pydata.org/pandas-docs/stable/indexing.html#boolean-indexing" rel="nofollow noreferrer">^{<cd2>}</a>和<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.isin.html" rel="nofollow noreferrer">^{<cd3>}</a>过滤,然后按<code>groupby</code>过滤,最后添加缺少的不匹配空值:</p>
<pre><code>s = (all_cites_dog[all_cites_dog['Cites_Dogs'].isin(all_cites_dog['Dog_Number'].unique())]
.groupby('Dog_Number')['Cites_Dogs']
.apply(list))
idx = np.setdiff1d(all_cites_dog['Dog_Number'].unique(), s.index)
s1 = pd.Series([[]] * len(idx), index=idx)
print (s1)
DOG126 []
dtype: object
s = s.append(s1).sort_index()
print (s)
DOG123 [DOG127]
DOG126 []
DOG127 [DOG123]
dtype: object
</code></pre>