基于数据帧中另一列的值添加列

3条回答

网友

1楼 · 编辑于 2024-10-16 17:23:19

这是一种使用内置计数器和掩码的完全矢量化方法（步骤将在下一节中详细说明）：

# create counter per section (0123401234...)
divider = df['Pos'].eq('')
section = divider.cumsum()
counter = df['Pos'].groupby(section).cumcount()

# isolate repeat1 and repeat2 sections (and flip repeat2 from 01234->43210)
rep1 = counter.where(df['Pos'].eq('repeat1'), 0)
rep2 = counter.sub(5).abs().where(df['Pos'].eq('repeat2'), 0)

# combine rep1 and rep2 (and replace divider rows with empty string)
df['B'] = rep1.add(rep2).mask(divider, '')

输出：

#         A      Pos  B
# 0   Emo/3  repeat3  0
# 1   Emo/4  repeat3  0
# 2   Emo/1  repeat3  0
# 3   Emo/3  repeat3  0
# 4                    
# 5   Emo/3  repeat1  1
# 6   Emo/4  repeat1  2
# 7   Emo/1  repeat1  3
# 8   Emo/3  repeat1  4
# 9                    
# 10  Neu/5  repeat2  4
# 11  Neu/2  repeat2  3
# 12  Neu/5  repeat2  2
# 13  Neu/2  repeat2  1

步骤

使用^{}从空行分隔符创建伪组：

divider = df['Pos'].eq('')
section = divider.cumsum()

# 0     0
# 1     0
# 2     0
# 3     0
# 4     1
# 5     1
# 6     1
# 7     1
# 8     1
# 9     2
# 10    2
# 11    2
# 12    2
# 13    2
# Name: Pos, dtype: int64

使用^{}创建节内计数器：

counter = df['Pos'].groupby(section).cumcount()

# 0     0
# 1     1
# 2     2
# 3     3
# 4     0
# 5     1
# 6     2
# 7     3
# 8     4
# 9     0
# 10    1
# 11    2
# 12    3
# 13    4
# dtype: int64

使用^{}屏蔽除repeat1行之外的所有内容：

rep1 = counter.where(df['Pos'].eq('repeat1'), 0)

# 0     0
# 1     0
# 2     0
# 3     0
# 4     0
# 5     1
# 6     2
# 7     3
# 8     4
# 9     0
# 10    0
# 11    0
# 12    0
# 13    0
# dtype: int64

对于repeat2行，将计数器从01234->；43210（减去5并取绝对值），然后再次使用^{}掩盖所有其他内容：

rep2 = counter.sub(5).abs().where(df['Pos'].eq('repeat2'), 0)

# 0     0
# 1     0
# 2     0
# 3     0
# 4     0
# 5     0
# 6     0
# 7     0
# 8     0
# 9     0
# 10    4
# 11    3
# 12    2
# 13    1
# dtype: int64

所以现在B列是rep1 + rep2，但我们也使用^{}将所有divider行替换为空字符串：

df['B'] = rep1.add(rep2).mask(divider, '')

#         A      Pos  B
# 0   Emo/3  repeat3  0
# 1   Emo/4  repeat3  0
# 2   Emo/1  repeat3  0
# 3   Emo/3  repeat3  0
# 4                    
# 5   Emo/3  repeat1  1
# 6   Emo/4  repeat1  2
# 7   Emo/1  repeat1  3
# 8   Emo/3  repeat1  4
# 9                    
# 10  Neu/5  repeat2  4
# 11  Neu/2  repeat2  3
# 12  Neu/5  repeat2  2
# 13  Neu/2  repeat2  1

网友

2楼 · 编辑于 2024-10-16 17:23:19

使用Pandas工具的通用解决方案

好吧，我花了一些时间才弄明白，但我想找到一个圆滑的答案，我有点喜欢这个：

import pandas as pd

data = {'A': ['Emo/3', 'Emo/4', 'Emo/1','Emo/3', '','Emo/3', 'Emo/4', 'Emo/1','Emo/3', '', 'Neu/5', 'Neu/2','Neu/5', 'Neu/2', '', 'Neu/5', 'Neu/2','Neu/5', 'Neu/2'],
        'Pos': ["repeat3", "repeat3", "repeat3", "repeat3", '',"repeat1", "repeat1", "repeat1", "repeat1", '', "repeat2", "repeat2","repeat2", "repeat2", '', "repeat2", "repeat2","repeat2", "repeat2"],
        }
df = pd.DataFrame(data)

#First we create column B and set first 4 value that are marked as repeat3 in 'Pos' column to zero
df['B']=df['Pos'].apply(lambda x: 0 if x == "repeat3" else x)

#Then we create a boolean mask for the rows where 'Pos' is equal to repeat1
mask1=df['B'].apply(lambda x: 1 if x == "repeat1"  else 0)
#Then we count how many blocks of type repeat1 we have
number_of_repeat1_blocks=int(mask1.sum()/4)
mask1=mask1.astype('bool')

#We do another mask the same for the rows where 'Pos' is equal to repeat2
mask2=df['B'].apply(lambda x: 1 if x == "repeat2"  else 0).astype('bool')
#Then we count how many blocks of type repeat1 we have
number_of_repeat2_blocks=int(mask2.sum()/4)
mask2=mask2.astype('bool')


#We define the number sequence that you want to replace in each case
#For rows matchin repeat1
repl1= [1,2,3,4]*number_of_repeat1_blocks
#For rows matching repeat2
repl2= [4,3,2,1,]*number_of_repeat2_blocks

#Finally we simply replace the matched patterns
df.loc[mask1,'B'] = repl1
df.loc[mask2,'B'] = repl2


print(df)

结果:

        A      Pos  B
0   Emo/3  repeat3  0
1   Emo/4  repeat3  0
2   Emo/1  repeat3  0
3   Emo/3  repeat3  0
4                    
5   Emo/3  repeat1  1
6   Emo/4  repeat1  2
7   Emo/1  repeat1  3
8   Emo/3  repeat1  4
9                    
10  Neu/5  repeat2  4
11  Neu/2  repeat2  3
12  Neu/5  repeat2  2
13  Neu/2  repeat2  1
14                   
15  Neu/5  repeat2  4
16  Neu/2  repeat2  3
17  Neu/5  repeat2  2
18  Neu/2  repeat2  1

网友

3楼 · 编辑于 2024-10-16 17:23:19

解决方案

我相信有更好的方法，但这里有一种方法：

df["B"] = ""
repeat_mapping = {"repeat3": [0]*4,
                  "repeat2": [*range(4, 0, -1)],
                  "repeat1": [*range(1, 5)]}

repeats = df[::5]["Pos"].map(repeat_mapping).explode()
repeats.index += pd.Series([*range(4)]*len(df[::5]))

df["B"][repeats.index] = repeats

输出：

        A      Pos  B
0   Emo/3  repeat3  0
1   Emo/4  repeat3  0
2   Emo/1  repeat3  0
3   Emo/3  repeat3  0
4
5   Emo/3  repeat1  1
6   Emo/4  repeat1  2
7   Emo/1  repeat1  3
8   Emo/3  repeat1  4
9
10  Neu/5  repeat2  4
11  Neu/2  repeat2  3
12  Neu/5  repeat2  2
13  Neu/2  repeat2  1

台阶

准备新专栏：

In [1]: df["B"] = ""

In [2]: df
Out[2]:
        A      Pos B
0   Emo/3  repeat3
1   Emo/4  repeat3
2   Emo/1  repeat3
3   Emo/3  repeat3
4
5   Emo/3  repeat1
6   Emo/4  repeat1
7   Emo/1  repeat1
8   Emo/3  repeat1
9
10  Neu/5  repeat2
11  Neu/2  repeat2
12  Neu/5  repeat2
13  Neu/2  repeat2

抓住第五排：

In [3]: df[::5]["Pos"]
Out[3]:
0     repeat3
5     repeat1
10    repeat2
Name: Pos, dtype: object

使用repeat_mapping：

In [4]: df[::5]["Pos"].map(repeat_mapping)
Out[4]:
0     [0, 0, 0, 0]
5     [1, 2, 3, 4]
10    [4, 3, 2, 1]
Name: Pos, dtype: object

分解列表：

In [5]: repeats = df[::5]["Pos"].map(repeat_mapping).explode()

In [6]: repeats
Out[6]:
0     0
0     0
0     0
0     0
5     1
5     2
5     3
5     4
10    4
10    3
10    2
10    1
Name: Pos, dtype: object

注意repeats中的每个索引都重复了4次。我们将通过将每个索引增加0, 1, 2, 3来解决这个问题：

In [7]: pd.Series([*range(4)]*len(df[::5])).values
Out[7]: array([0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3], dtype=int64)

In [8]: repeats.index += pd.Series([*range(4)]*len(df[::5]))

In [9]: repeats
Out[9]:
0     0
1     0
2     0
3     0
5     1
6     2
7     3
8     4
10    4
11    3
12    2
13    1
Name: Pos, dtype: object

最后，df["B"][repeats.index]只选择其索引与repeats索引匹配的行，然后将repeats的值分配给这些行

步骤

使用Pandas工具的通用解决方案

解决方案

台阶

相关问题更多 >

编程相关推荐

热门问题

热门文章