根据条件生成列

2024-05-20 10:10:20 发布

您现在位置:Python中文网/ 问答频道 /正文

以下是示例数据集:

>>> df
   vn    pt    st nst stb mid
0   a   0.1     a   b   0   3
1   a   0.2     a   b   4   3
2   a   0.3     a   b   1   3
3   a   0.3     b   a   1   3
4   a   0.4     a   b   1   3
5   a   0.4     a   b   2   3
6   a   0.5     c   b   6   3
7   a   0.5     c   b   0   3
8   a   0.6     c   b   1   3
9   a   1.1     b   c   2   3
10  a   1.2     b   c   1   3
11  a   1.3     d   b   6   3
12  a   1.4     d   b   0   3
13  a   1.4     d   b   1   3
14  a   1.5     e   d   2   3
15  a   1.6     d   e   0   3
16  a   0.1     d   y   1   7
17  a   0.2     y   d   4   7
18  a   0.3     y   d   1   7
19  a   0.4     y   x   3   7
20  a   0.5     x   z   0   7
21  a   0.6     p   z   2   7
22  a   0.6     z   p   6   7
23  a   1.1     p   q   3   7

从这个数据集中,我想创建两个新列srnsr。需要记住的几件事:stb值表示st的对应值。当在stnst中注册了一个新字符串时,默认情况下sr=0nsr=0相应地注册

st的编码:1.当st的值连续相同时sr=sr+stb,2.当nst的值移动到st{},3.当有一个新值分配给stst=stb

nst的编码:1.当nst的值连续相同时nsr将保持不变(无变化),2.当st的值移动到nst时,前一个sr的值应返回到下一个nsr,3.当有一个新值分配给nstnsr=0

迭代将继续,直到mid是连续的相同值(当出现不同的mid时,它将从头开始迭代)。要生成这两列,请查看以下示例:

st nst stb  sr                                             nsr
 a   b   0  0+0=0(sr=sr+stb)                               0(nst newly enrolled, set to 0)
 a   b   4  0+4=4(sr=sr+stb)                               0(remains same)
 a   b   1  4+1=5(sr=sr+stb)                               0(remains same)
 b   a   1  0+1=1(sr=nsr+stb),bcz b moves from nst to st   5(shifts from sr to nsr)
 a   b   1  5+1=6(sr=nsr+stb),bcz a moves from nst to st   1(shifts from sr to nsr)
 a   b   2  6+2=8(sr=sr+stb)                               1(remains same)
 c   b   6  0+6=6(sr=sr+stb),c newly inserted              1(remains same)
...........
(will continue recursively until `mid` is unique)
...........

预期输出:

   vn    pt    st  sr nsr
0   a   0.1     a   0   0
1   a   0.2     a   4   0
2   a   0.3     a   5   0
3   a   0.3     b   1   5
4   a   0.4     a   6   1
5   a   0.4     a   8   1
6   a   0.5     c   6   1
7   a   0.5     c   6   1
8   a   0.6     c   7   1
9   a   1.1     b   3   7
10  a   1.2     b   4   7
11  a   1.3     d   6   4
12  a   1.4     d   6   4
13  a   1.4     d   7   4
14  a   1.5     e   2   7
15  a   1.6     d   7   2
16  a   0.1     d   1   0
17  a   0.2     y   4   1
18  a   0.3     y   5   1
19  a   0.4     y   8   0
20  a   0.5     x   0   0
21  a   0.6     p   2   0
22  a   0.6     z   6   2
23  a   1.1     p   5   0 

Tags: to数据frompt示例编码samest
2条回答

(部分尝试等待反馈)− 不符合评论。)

根据您的解释,sr是每个stnst对的stb的不同累计和。但是,这并不完全符合您的预期输出:

>>> df['sr'] = df.groupby(['nst', 'st'])['stb'].cumsum()
>>> df[['sr']].join([expected['sr'].rename('expected'), (df['sr'] - expected['sr']).rename('diff')])
    sr  expected  diff
0    0         0     0
1    4         4     0
2    5         5     0
3    1         1     0
4    6         6     0
5    8         8     0
6    6         6     0
7    6         6     0
8    7         7     0
9    2         3    -1
10   3         4    -1
11   6         6     0
12   6         6     0
13   7         7     0
14   2         2     0
15   0         7    -7
16   1         1     0
17   4         4     0
18   5         5     0
19   3         8    -5
20   0         0     0
21   2         2     0
22   6         6     0
23   3         5    -2

第9、10、15、19和23行发生了什么

例如,第9行是第一个带有b, c的行,如果我将其与第3行比较,第一个带有b, a的行应该是0+3,就像第3行是0+1

根据评论中的问题和讨论,以下是迄今为止的部分解决方案:

sr列已获得预期结果,但nsr需要进一步的工作:

df['sr'] = df.groupby(['mid', 'st'])['stb'].cumsum()

结果:

print(df)

   vn   pt st nst  stb  mid  sr
0   a  0.1  a   b    0    3   0
1   a  0.2  a   b    4    3   4
2   a  0.3  a   b    1    3   5
3   a  0.3  b   a    1    3   1
4   a  0.4  a   b    1    3   6
5   a  0.4  a   b    2    3   8
6   a  0.5  c   b    6    3   6
7   a  0.5  c   b    0    3   6
8   a  0.6  c   b    1    3   7
9   a  1.1  b   c    2    3   3
10  a  1.2  b   c    1    3   4
11  a  1.3  d   b    6    3   6
12  a  1.4  d   b    0    3   6
13  a  1.4  d   b    1    3   7
14  a  1.5  e   d    2    3   2
15  a  1.6  d   e    0    3   7
16  a  0.1  d   y    1    7   1
17  a  0.2  y   d    4    7   4
18  a  0.3  y   d    1    7   5
19  a  0.4  y   x    3    7   8
20  a  0.5  x   z    0    7   0
21  a  0.6  p   z    2    7   2
22  a  0.6  z   p    6    7   6
23  a  1.1  p   q    3    7   5

nsr进行的部分工作:

m1 = df['st'].ne(df['st'].groupby(df['mid']).shift())
m2 = df['st'].eq(df['nst'].shift())
m3 = df['nst'].eq(df['st'].shift())
m = m1 & (m2 | m3)

df['nsr'] = np.where(m, df['sr'].shift(), np.nan)

m11 = df['mid'] != df['mid'].shift()
df['nsr'] = np.where(m11, 0, df['nsr'])

df['nsr'] = df['nsr'].ffill(downcast='infer')

结果:

print(df)

   vn   pt st nst  stb  mid  sr  nsr
0   a  0.1  a   b    0    3   0    0
1   a  0.2  a   b    4    3   4    0
2   a  0.3  a   b    1    3   5    0
3   a  0.3  b   a    1    3   1    5
4   a  0.4  a   b    1    3   6    1
5   a  0.4  a   b    2    3   8    1
6   a  0.5  c   b    6    3   6    1
7   a  0.5  c   b    0    3   6    1
8   a  0.6  c   b    1    3   7    1
9   a  1.1  b   c    2    3   3    7
10  a  1.2  b   c    1    3   4    7
11  a  1.3  d   b    6    3   6    4
12  a  1.4  d   b    0    3   6    4
13  a  1.4  d   b    1    3   7    4
14  a  1.5  e   d    2    3   2    7
15  a  1.6  d   e    0    3   7    2
16  a  0.1  d   y    1    7   1    0
17  a  0.2  y   d    4    7   4    1
18  a  0.3  y   d    1    7   5    1
19  a  0.4  y   x    3    7   8    1
20  a  0.5  x   z    0    7   0    8
21  a  0.6  p   z    2    7   2    8
22  a  0.6  z   p    6    7   6    2
23  a  1.1  p   q    3    7   5    6

编辑

这里是另一次尝试,以完成上次留下的部分作品

通过添加一组新的处理,最终实现了nsr的期望值

m1 = df['st'].ne(df['st'].groupby(df['mid']).shift())
m2 = df['st'].eq(df['nst'].shift())
m3 = df['nst'].eq(df['st'].shift())
m = m1 & (m2 | m3)

df['nsr'] = np.where(m, df['sr'].shift(), np.nan)

## Handle the condition with a new value of `nst` is seen AND
## at the same time, it is NOT shifted from `st`:
# start of new codes
m21 = df['nst'] != df['nst'].shift()
m22 = df['nst'] != df['st'].shift()
df['nsr'] = np.where(m21 & m22, 0, df['nsr'])
# end of new codes

m11 = df['mid'] != df['mid'].shift()
df['nsr'] = np.where(m11, 0, df['nsr'])

df['nsr'] = df['nsr'].ffill(downcast='infer')

结果:

print(df)

   vn   pt st nst  stb  mid  sr  nsr
0   a  0.1  a   b    0    3   0    0
1   a  0.2  a   b    4    3   4    0
2   a  0.3  a   b    1    3   5    0
3   a  0.3  b   a    1    3   1    5
4   a  0.4  a   b    1    3   6    1
5   a  0.4  a   b    2    3   8    1
6   a  0.5  c   b    6    3   6    1
7   a  0.5  c   b    0    3   6    1
8   a  0.6  c   b    1    3   7    1
9   a  1.1  b   c    2    3   3    7
10  a  1.2  b   c    1    3   4    7
11  a  1.3  d   b    6    3   6    4
12  a  1.4  d   b    0    3   6    4
13  a  1.4  d   b    1    3   7    4
14  a  1.5  e   d    2    3   2    7
15  a  1.6  d   e    0    3   7    2
16  a  0.1  d   y    1    7   1    0
17  a  0.2  y   d    4    7   4    1
18  a  0.3  y   d    1    7   5    1
19  a  0.4  y   x    3    7   8    0
20  a  0.5  x   z    0    7   0    0
21  a  0.6  p   z    2    7   2    0
22  a  0.6  z   p    6    7   6    2
23  a  1.1  p   q    3    7   5    0

相关问题 更多 >