基于数据帧中另一列的值添加列问题的回答

基于数据帧中另一列的值添加列

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

这是一种使用内置计数器和掩码的完全矢量化方法（步骤将在下一节中详细说明）： <pre><code># create counter per section (0123401234...) divider = df['Pos'].eq('') section = divider.cumsum() counter = df['Pos'].groupby(section).cumcount() # isolate repeat1 and repeat2 sections (and flip repeat2 from 01234->43210) rep1 = counter.where(df['Pos'].eq('repeat1'), 0) rep2 = counter.sub(5).abs().where(df['Pos'].eq('repeat2'), 0) # combine rep1 and rep2 (and replace divider rows with empty string) df['B'] = rep1.add(rep2).mask(divider, '') </code></pre> 输出： <pre><code># A Pos B # 0 Emo/3 repeat3 0 # 1 Emo/4 repeat3 0 # 2 Emo/1 repeat3 0 # 3 Emo/3 repeat3 0 # 4 # 5 Emo/3 repeat1 1 # 6 Emo/4 repeat1 2 # 7 Emo/1 repeat1 3 # 8 Emo/3 repeat1 4 # 9 # 10 Neu/5 repeat2 4 # 11 Neu/2 repeat2 3 # 12 Neu/5 repeat2 2 # 13 Neu/2 repeat2 1 </code></pre> <hr/> <h3>步骤</h3> <ol> <li>使用<a href="https://pandas.pydata.org/docs/reference/api/pandas.Series.cumsum.html" rel="nofollow noreferrer">^{<cd1>}</a>从空行分隔符创建伪组： <pre><code>divider = df['Pos'].eq('') section = divider.cumsum() # 0 0 # 1 0 # 2 0 # 3 0 # 4 1 # 5 1 # 6 1 # 7 1 # 8 1 # 9 2 # 10 2 # 11 2 # 12 2 # 13 2 # Name: Pos, dtype: int64 </code></pre> </li> <li>使用<a href="https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.GroupBy.cumcount.html" rel="nofollow noreferrer">^{<cd2>}</a>创建节内计数器： <pre><code>counter = df['Pos'].groupby(section).cumcount() # 0 0 # 1 1 # 2 2 # 3 3 # 4 0 # 5 1 # 6 2 # 7 3 # 8 4 # 9 0 # 10 1 # 11 2 # 12 3 # 13 4 # dtype: int64 </code></pre> </li> <li>使用<a href="https://pandas.pydata.org/docs/reference/api/pandas.Series.where.html" rel="nofollow noreferrer">^{<cd3>}</a>屏蔽除<code>repeat1</code>行之外的所有内容： <pre><code>rep1 = counter.where(df['Pos'].eq('repeat1'), 0) # 0 0 # 1 0 # 2 0 # 3 0 # 4 0 # 5 1 # 6 2 # 7 3 # 8 4 # 9 0 # 10 0 # 11 0 # 12 0 # 13 0 # dtype: int64 </code></pre> </li> <li>对于<code>repeat2</code>行，将计数器从01234-&gt；43210（减去5并取绝对值），然后再次使用<a href="https://pandas.pydata.org/docs/reference/api/pandas.Series.where.html" rel="nofollow noreferrer">^{<cd3>}</a>掩盖所有其他内容： <pre><code>rep2 = counter.sub(5).abs().where(df['Pos'].eq('repeat2'), 0) # 0 0 # 1 0 # 2 0 # 3 0 # 4 0 # 5 0 # 6 0 # 7 0 # 8 0 # 9 0 # 10 4 # 11 3 # 12 2 # 13 1 # dtype: int64 </code></pre> </li> <li>所以现在<code>B</code>列是<code>rep1 + rep2</code>，但我们也使用<a href="https://pandas.pydata.org/docs/reference/api/pandas.Series.mask.html" rel="nofollow noreferrer">^{<cd9>}</a>将所有<code>divider</code>行替换为空字符串： <pre><code>df['B'] = rep1.add(rep2).mask(divider, '') # A Pos B # 0 Emo/3 repeat3 0 # 1 Emo/4 repeat3 0 # 2 Emo/1 repeat3 0 # 3 Emo/3 repeat3 0 # 4 # 5 Emo/3 repeat1 1 # 6 Emo/4 repeat1 2 # 7 Emo/1 repeat1 3 # 8 Emo/3 repeat1 4 # 9 # 10 Neu/5 repeat2 4 # 11 Neu/2 repeat2 3 # 12 Neu/5 repeat2 2 # 13 Neu/2 repeat2 1 </code></pre> </li> </ol>

基于数据帧中另一列的值添加列

1 个回答

相关Python问题