<p>这是一种使用内置计数器和掩码的完全矢量化方法(步骤将在下一节中详细说明):</p>
<pre><code># create counter per section (0123401234...)
divider = df['Pos'].eq('')
section = divider.cumsum()
counter = df['Pos'].groupby(section).cumcount()
# isolate repeat1 and repeat2 sections (and flip repeat2 from 01234->43210)
rep1 = counter.where(df['Pos'].eq('repeat1'), 0)
rep2 = counter.sub(5).abs().where(df['Pos'].eq('repeat2'), 0)
# combine rep1 and rep2 (and replace divider rows with empty string)
df['B'] = rep1.add(rep2).mask(divider, '')
</code></pre>
<p>输出:</p>
<pre><code># A Pos B
# 0 Emo/3 repeat3 0
# 1 Emo/4 repeat3 0
# 2 Emo/1 repeat3 0
# 3 Emo/3 repeat3 0
# 4
# 5 Emo/3 repeat1 1
# 6 Emo/4 repeat1 2
# 7 Emo/1 repeat1 3
# 8 Emo/3 repeat1 4
# 9
# 10 Neu/5 repeat2 4
# 11 Neu/2 repeat2 3
# 12 Neu/5 repeat2 2
# 13 Neu/2 repeat2 1
</code></pre>
<hr/>
<h3>步骤</h3>
<ol>
<li><p>使用<a href="https://pandas.pydata.org/docs/reference/api/pandas.Series.cumsum.html" rel="nofollow noreferrer">^{<cd1>}</a>从空行分隔符创建伪组:</p>
<pre><code>divider = df['Pos'].eq('')
section = divider.cumsum()
# 0 0
# 1 0
# 2 0
# 3 0
# 4 1
# 5 1
# 6 1
# 7 1
# 8 1
# 9 2
# 10 2
# 11 2
# 12 2
# 13 2
# Name: Pos, dtype: int64
</code></pre>
</li>
<li><p>使用<a href="https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.GroupBy.cumcount.html" rel="nofollow noreferrer">^{<cd2>}</a>创建节内计数器:</p>
<pre><code>counter = df['Pos'].groupby(section).cumcount()
# 0 0
# 1 1
# 2 2
# 3 3
# 4 0
# 5 1
# 6 2
# 7 3
# 8 4
# 9 0
# 10 1
# 11 2
# 12 3
# 13 4
# dtype: int64
</code></pre>
</li>
<li><p>使用<a href="https://pandas.pydata.org/docs/reference/api/pandas.Series.where.html" rel="nofollow noreferrer">^{<cd3>}</a>屏蔽除<code>repeat1</code>行之外的所有内容:</p>
<pre><code>rep1 = counter.where(df['Pos'].eq('repeat1'), 0)
# 0 0
# 1 0
# 2 0
# 3 0
# 4 0
# 5 1
# 6 2
# 7 3
# 8 4
# 9 0
# 10 0
# 11 0
# 12 0
# 13 0
# dtype: int64
</code></pre>
</li>
<li><p>对于<code>repeat2</code>行,将计数器从01234->;43210(减去5并取绝对值),然后再次使用<a href="https://pandas.pydata.org/docs/reference/api/pandas.Series.where.html" rel="nofollow noreferrer">^{<cd3>}</a>掩盖所有其他内容:</p>
<pre><code>rep2 = counter.sub(5).abs().where(df['Pos'].eq('repeat2'), 0)
# 0 0
# 1 0
# 2 0
# 3 0
# 4 0
# 5 0
# 6 0
# 7 0
# 8 0
# 9 0
# 10 4
# 11 3
# 12 2
# 13 1
# dtype: int64
</code></pre>
</li>
<li><p>所以现在<code>B</code>列是<code>rep1 + rep2</code>,但我们也使用<a href="https://pandas.pydata.org/docs/reference/api/pandas.Series.mask.html" rel="nofollow noreferrer">^{<cd9>}</a>将所有<code>divider</code>行替换为空字符串:</p>
<pre><code>df['B'] = rep1.add(rep2).mask(divider, '')
# A Pos B
# 0 Emo/3 repeat3 0
# 1 Emo/4 repeat3 0
# 2 Emo/1 repeat3 0
# 3 Emo/3 repeat3 0
# 4
# 5 Emo/3 repeat1 1
# 6 Emo/4 repeat1 2
# 7 Emo/1 repeat1 3
# 8 Emo/3 repeat1 4
# 9
# 10 Neu/5 repeat2 4
# 11 Neu/2 repeat2 3
# 12 Neu/5 repeat2 2
# 13 Neu/2 repeat2 1
</code></pre>
</li>
</ol>