查找重复值包并分别对其执行操作

2条回答

网友

1楼 · 编辑于 2024-05-04 21:21:44

方法#1

这里有一个简单的方法-

In [23]: ids = np.r_[0,b[:-1]!=b[1:]].cumsum()

In [24]: np.where(b==1,a.groupby(ids).transform('mean'),a)
Out[24]: 
array([1.        , 4.        , 4.        , 4.        , 7.        ,
       5.        , 6.        , 6.        , 6.        , 6.        ,
       7.        , 3.33333333, 3.33333333, 3.33333333, 6.        ,
       9.        ])

方法#2

对于性能，我们可以利用^{}-

In [47]: v = np.bincount(ids,a)/np.bincount(ids)

In [48]: np.where(b==1,v[ids],a)
Out[48]: 
array([1.        , 4.        , 4.        , 4.        , 7.        ,
       5.        , 6.        , 6.        , 6.        , 6.        ,
       7.        , 3.33333333, 3.33333333, 3.33333333, 6.        ,
       9.        ])

网友

2楼 · 编辑于 2024-05-04 21:21:44

尝试使用shift+cumsum，注意6,2,2,的平均值是3.333..而不是5

s = pd.Series(b,index=a.index)
a.groupby(s.ne(s.shift()).cumsum()).transform('mean').where(s.eq(1),a)

0     1.000000
1     4.000000
2     4.000000
3     4.000000
4     7.000000
5     5.000000
6     6.000000
7     6.000000
8     6.000000
9     6.000000
10    7.000000
11    3.333333
12    3.333333
13    3.333333
14    6.000000
15    9.000000
dtype: float64

相关问题更多 >

编程相关推荐

热门问题

热门文章

查找重复值包并分别对其执行操作

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >