Pandas：使用带掩码的.loc分配给多索引

def mklbl(prefix, n): return ["%s%s" % (prefix, i) for i in range(n)] miindex = pd.MultiIndex.from_product( [mklbl("A", 4), mklbl("B", 2), mklbl("C", 4), mklbl("D", 2)] ) micolumns = pd.MultiIndex.from_tuples( [("a", "foo"), ("a", "bar"), ("b", "foo"), ("b", "bah")], names=["lvl0", "lvl1"] ) dfmi = ( pd.DataFrame( np.arange(len(miindex) * len(micolumns)).reshape( (len(miindex), len(micolumns)) ), index=miindex, columns=micolumns, ) .sort_index() .sort_index(axis=1) )

>>> dfmi lvl0 a b lvl1 bar foo bah foo A0 B0 C0 D0 1 0 3 2 D1 5 4 7 6 C1 D0 9 8 11 10 D1 13 12 15 14 C2 D0 17 16 19 18 ... ... ... ... ... A3 B1 C1 D1 237 236 239 238 C2 D0 241 240 243 242 D1 245 244 247 246 C3 D0 249 248 251 250 D1 253 252 255 254 [64 rows x 4 columns]

mask = ( (dfmi.loc[pd.IndexSlice[:,:,:,"D1"], ("a","bar")] % 3 == 0) & (dfmi.loc[pd.IndexSlice[:,:,:,"D1"], ("a","foo")] > 100)) dfmi.loc[pd.IndexSlice[:,:,:,"D0",mask], ("a","bar")] = np.nan

2条回答

网友

1楼 · 编辑于 2024-06-23 19:59:42

您可以创建临时多索引d0：

d0 = dfmi.loc[pd.IndexSlice[:,:,:,"D0"], ('a','bar')]

接下来，使用来自mask的布尔值，并结合mask方法来获取空值：

d0 = d0.mask(mask.array)

使用d0更新原始数据帧：

dfmi.loc[d0.index, ('a', 'bar')] = d0

网友

2楼 · 编辑于 2024-06-23 19:59:42

一种选择是使用numpy的argwhere方法按满足所有条件的行的索引进行过滤

例如：

is_D1 = np.array([index[-1] == "D1" for index in dfmi.index])
is_multiple_of_3 = np.array(dfmi.loc[:, ("a", "bar")] % 3 == 0)
is_greater_than_100 = np.array(dfmi.loc[:, ("a", "foo")] > 100)
mask = np.argwhere(is_D1 & is_multiple_of_3 & is_greater_than_100).flatten()
dfmi.iloc[mask - 1, dfmi.columns == ("a", "bar")] = np.nan

相关问题更多 >

编程相关推荐

热门问题

热门文章