<p>我认为需要<a href="https://docs.scipy.org/doc/numpy/reference/generated/numpy.sort.html" rel="nofollow noreferrer">^{<cd1>}</a>和<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.drop_duplicates.html" rel="nofollow noreferrer">^{<cd2>}</a>-返回排序的行:</p>
<pre><code>df[['seq_sp1','seq_sp2']] = np.sort(df[['seq_sp1','seq_sp2']], axis=1)
df = df.drop_duplicates(subset=['seq_sp1','seq_sp2'])
print (df)
cluster seq_sp1 seq_sp2
0 1 seq20 seq56
2 2 seq3 seq5
3 3 seq5 seq9
4 3 seq4 seq7
</code></pre>
<p>或使用<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.duplicated.html" rel="nofollow noreferrer">^{<cd3>}</a>作为带反转掩码的掩码,按<code>~</code>过滤,按<a href="http://pandas.pydata.org/pandas-docs/stable/indexing.html#boolean-indexing" rel="nofollow noreferrer">^{<cd5>}</a>过滤输出中的原始未排序值:</p>
<pre><code>mask = pd.DataFrame(np.sort(df[['seq_sp1','seq_sp2']], axis=1), index=df.index).duplicated()
df = df[~mask]
print (df)
cluster seq_sp1 seq_sp2
0 1 seq20 seq56
2 2 seq3 seq5
3 3 seq9 seq5
4 3 seq7 seq4
</code></pre>
<p>编辑:</p>
<p>我用新数据测试它:</p>
<pre><code>df = df[['qseqid','sseqid']]
print (df)
qseqid sseqid
13 EOG090X00GO_0035_0035_1 EOG090X00GO_0042_0035_1
14 EOG090X00GO_0035_0035_1 EOG090X00GO_0042_0042_1
16 EOG090X00GO_0035_0042_1 EOG090X00GO_0042_0035_1
17 EOG090X00GO_0035_0042_1 EOG090X00GO_0042_0042_1
19 EOG090X00GO_0042_0035_1 EOG090X00GO_0035_0035_1
20 EOG090X00GO_0042_0035_1 EOG090X00GO_0035_0042_1
22 EOG090X00GO_0042_0042_1 EOG090X00GO_0035_0035_1
23 EOG090X00GO_0042_0042_1 EOG090X00GO_0035_0042_1
df[['qseqid','sseqid']] = np.sort(df[['qseqid','sseqid']], axis=1)
df = df.drop_duplicates(subset=['qseqid','sseqid'])
print (df)
qseqid sseqid
13 EOG090X00GO_0035_0035_1 EOG090X00GO_0042_0035_1
14 EOG090X00GO_0035_0035_1 EOG090X00GO_0042_0042_1
16 EOG090X00GO_0035_0042_1 EOG090X00GO_0042_0035_1
17 EOG090X00GO_0035_0042_1 EOG090X00GO_0042_0042_1
</code></pre>
<hr/>
<pre><code>mask = pd.DataFrame(np.sort(df[['qseqid','sseqid']], axis=1), index=df.index).duplicated()
print (~mask)
13 True
14 True
16 True
17 True
19 False
20 False
22 False
23 False
dtype: bool
df = df[~mask]
print (df)
qseqid sseqid
13 EOG090X00GO_0035_0035_1 EOG090X00GO_0042_0035_1
14 EOG090X00GO_0035_0035_1 EOG090X00GO_0042_0042_1
16 EOG090X00GO_0035_0042_1 EOG090X00GO_0042_0035_1
17 EOG090X00GO_0035_0042_1 EOG090X00GO_0042_0042_1
</code></pre>