缺少的列值将根据可用值填充

df = pd.DataFrame( {'farm': [419,382, 382, 382, 411, 411, 411], 'variety': ['Gala', 'Gala', 'Empire', '', 'Honeycrisp', '', 'Fuji'], 'ripening':[2,2,3,3,3,3,6], 'D': np.random.randn(7)*10, 'E': list('abcdefg') } ) df Out[223]: farm variety ripening D E 0 419 Gala 2 12.921246 a 1 382 Gala 2 -2.776150 b 2 382 Empire 3 3.551226 c 3 382 3 2.715187 d 4 411 Honeycrisp 3 -13.557640 e 5 411 3 -11.525100 f 6 411 Fuji 6 -3.660661 g

farm variety ripening D E 0 419 Gala 2 12.921246 a 1 382 Gala 2 -2.776150 b 2 382 Empire 3 3.551226 c 3 382 Empire 3 2.715187 d 4 411 Honeycrisp 3 -13.557640 e 5 411 Honeycrisp 3 -11.525100 f 6 411 Fuji 6 -3.660661 g

1条回答

网友
1楼 · 发布于 2024-05-17 04:05:24

使用：
#create NaNs instead empty strings df['variety'] = df['variety'].replace('', np.nan) #test if only 1 unique category per ripening and farm m = m = df.groupby(['farm','ripening'])['variety'].transform('nunique').eq(1) #only for filtered rows forward filling values per groups df.update(df[m].groupby(['farm','ripening'])['variety'].ffill()) print (df) farm variety ripening D E 0 419 Gala 2 -12.571434 a 1 382 Gala 2 1.839992 b 2 382 Empire 3 18.946881 c 3 382 Empire 3 6.552552 d 4 411 Honeycrisp 3 11.755782 e 5 411 Honeycrisp 3 11.272973 f 6 411 Fuji 6 7.416918 g

相关问题更多 >

编程相关推荐

热门问题

热门文章