擅长:python、mysql、java
<p>我认为以下方法可行:</p>
<pre><code>In [37]:
import pandas as pd
import io
temp = """InteractorA InteractorB
AGAP028204 AGAP005846
AGAP028204 AGAP003428
AGAP028200 AGAP011124
AGAP028200 AGAP004335
AGAP028200 AGAP011356
AGAP028194 AGAP008414
AGAP002741 AGAP008026
AGAP008026 AGAP002741"""
df = pd.read_csv(io.StringIO(temp), sep='\s+')
df
Out[37]:
InteractorA InteractorB
0 AGAP028204 AGAP005846
1 AGAP028204 AGAP003428
2 AGAP028200 AGAP011124
3 AGAP028200 AGAP004335
4 AGAP028200 AGAP011356
5 AGAP028194 AGAP008414
6 AGAP002741 AGAP008026
7 AGAP008026 AGAP002741
</code></pre>
<p>因此,我下载了您的数据,误解了您的需求,因此以下内容将起作用:</p>
^{pr2}$
<p>现在,我们希望获得重复的行,但取第一个值:</p>
<pre><code>In [74]:
df2 = df[df.InteractorA.isin(df.InteractorB)]
df2 = df2.groupby('InteractorA').first().reset_index()
df2.shape
Out[74]:
(3074, 2)
</code></pre>
<p>现在连接两个数据帧:</p>
<pre><code>In [75]:
merged = pd.concat([df1, df2], ignore_index=True)
merged.shape
Out[75]:
(5460, 2)
</code></pre>
<p>我认为这是正确的。在</p>