为什么Numpy掩码数组有用？

>>>data = np.arange(12).reshape(3, 4) >>>mask = np.array([[0., 0., 1., 0.], [0., 0., 0., 1.], [0., 1., 0., 0.]]) >>>masked = np.ma.array(data, mask=mask) >>>masked masked_array( data=[[0, 1, --, 3], [4, 5, 6, --], [8, --, 10, 11]], mask=[[False, False, True, False], [False, False, False, True], [False, True, False, False]], fill_value=999999) >>>masked.sum(axis=0) masked_array(data=[12, 6, 16, 14], mask=[False, False, False, False], fill_value=999999)

1条回答

网友

1楼 · 发布于 2024-09-25 10:20:41

官方的答案是here：

In theory, IEEE nan was specifically designed to address the problem of missing values, but the reality is that different platforms behave differently, making life more difficult. On some platforms, the presence of nan slows calculations 10-100 times. For integer data, no nan value exists.

事实上，与类似的nan阵列相比，屏蔽阵列可能相当慢：

import numpy as np
g = np.random.random((5000,5000))
indx = np.random.randint(0,4999,(500,2))
g_nan = g.copy()
g_nan[indx] = np.nan
mask =  np.full((5000,5000),False,dtype=bool)
mask[indx] = True
g_mask = np.ma.array(g,mask=mask)

%timeit (g_mask + g_mask)**2
1.27 s ± 35.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
(g_nan + g_nan)**2
%timeit (g_nan + g_nan)**2
76.5 ms ± 715 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

它们什么时候有用？在

在多年的编程中，我发现它们在以下场合很有用：

如果要保留遮罩的值以便以后处理，而不复制数组。在
您不想被nan操作的奇怪行为所欺骗（you might be tricked by the behaviour of masked array）。在
如果掩码是数组的一部分，则必须使用它们的掩码来处理许多数组时，可以避免代码和混淆。在
与nan值相比，可以为掩码值指定不同的含义。例如，我使用np.nan来表示缺失值，但我也屏蔽了信噪比较差的值，这样我就可以同时识别这两个值。在

通常，可以将遮罩数组视为更紧凑的表示形式。最好的方法是逐个测试更容易理解和有效的解决方案。在

它们什么时候有用？在

相关问题更多 >

编程相关推荐

热门问题

热门文章