<p>使用<code>groupby</code>+<code>duplicated</code>:</p>
<pre><code>df[~df.groupby(df.A.eq('spec').cumsum()).apply(lambda x: x.duplicated()).values]
A B C
0 spec first second
1 test text1 text2
2 act text12 text13
3 act text14 text15
4 test text32 text33
5 act text34 text35
6 test text85 text86
7 act text87 text88
13 spec third fourth
14 test text1 text2
15 act text12 text13
16 act text14 text15
17 test text85 text86
18 act text87 text88
</code></pre>
<hr/>
<p><strong>细节</strong></p>
<p>我们使用<code>cumsum</code>查找特定“spec”条目下的所有行。组标签包括:</p>
<pre><code>df.A.eq('spec').cumsum()
0 1
1 1
2 1
3 1
4 1
5 1
6 1
7 1
8 1
9 1
10 1
11 1
12 1
13 2
14 2
15 2
16 2
17 2
18 2
19 2
20 2
21 2
22 2
23 2
Name: A, dtype: int64
</code></pre>
<p>然后对该序列进行分组,并计算每组的重复项:</p>
<pre><code>df.groupby(df.A.eq('spec').cumsum()).apply(lambda x: x.duplicated()).values
array([False, False, False, False, False, False, False, False, True,
True, True, True, True, False, False, False, False, False,
False, True, True, True, True, True])
</code></pre>
<p>由此,剩下的就是保留那些对应于“False”的行(即,<em>不</em>复制)。你知道吗</p>