Pandas：使用带掩码的.loc分配给多索引问题的回答

Pandas：使用带掩码的.loc分配给多索引

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

使用<a href="https://pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html#using-slicers" rel="nofollow noreferrer">MultiIndex / advanced indexing: Using slicers</a>文档中的示例 <pre><code>def mklbl(prefix, n): return ["%s%s" % (prefix, i) for i in range(n)] miindex = pd.MultiIndex.from_product( [mklbl("A", 4), mklbl("B", 2), mklbl("C", 4), mklbl("D", 2)] ) micolumns = pd.MultiIndex.from_tuples( [("a", "foo"), ("a", "bar"), ("b", "foo"), ("b", "bah")], names=["lvl0", "lvl1"] ) dfmi = ( pd.DataFrame( np.arange(len(miindex) * len(micolumns)).reshape( (len(miindex), len(micolumns)) ), index=miindex, columns=micolumns, ) .sort_index() .sort_index(axis=1) ) </code></pre> <pre><code>>>> dfmi lvl0 a b lvl1 bar foo bah foo A0 B0 C0 D0 1 0 3 2 D1 5 4 7 6 C1 D0 9 8 11 10 D1 13 12 15 14 C2 D0 17 16 19 18 ... ... ... ... ... A3 B1 C1 D1 237 236 239 238 C2 D0 241 240 243 242 D1 245 244 247 246 C3 D0 249 248 251 250 D1 253 252 255 254 [64 rows x 4 columns] </code></pre> 在伪代码中，我想要的是： <pre><code>if D1/bar % 3 == 0 && D1/foo > 100: D0/bar = np.nan </code></pre> 差不多，但不完全是这样： <pre><code>mask = ( (dfmi.loc[pd.IndexSlice[:,:,:,"D1"], ("a","bar")] % 3 == 0) & (dfmi.loc[pd.IndexSlice[:,:,:,"D1"], ("a","foo")] > 100)) dfmi.loc[pd.IndexSlice[:,:,:,"D0",mask], ("a","bar")] = np.nan </code></pre> 问题在于，在任何给定的索引级别上，掩码或选择器都可以应用，但不能同时应用两者。例如，我可以在不同级别应用遮罩。这要求生成带有完整索引（无缺失值）的掩码，或与原始索引重新对齐。如何（不排除其他方法） <hr/> 以后 我真的认为这会起作用，因为最里面的索引应该有一半的行，但出于某种原因，它会引发一个<code>ValueError</code>。有人知道为什么吗 <pre><code>>>> dfmi.swaplevel(0,3).loc[pd.IndexSlice["D0",:,:,mask.values], ("a","bar")] = np.nan ... ValueError: cannot index with a boolean indexer that is not the same length as the index </code></pre> 虽然这确实有效，但我认为会有一种更干净的方法来更改索引值。我以为我过去成功地使用了<code>index.set_levels</code>。有人想把这个修好吗 <pre><code>t = mask.reset_index() t["level_3"] = "D0" t = t.set_index(list(t.columns.values[:4])) mask = t.reindex(dfmi.index).fillna(False) dfmi.loc[mask[0], ("a","bar")] = np.nan </code></pre>

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

Pandas：使用带掩码的.loc分配给多索引

1 个回答

相关Python问题