Numpy：高于/低于平均值的标准偏差评估

2条回答

网友

1楼 · 编辑于 2024-06-29 00:52:06

据我所知，您需要计算每列的标准偏差，其中值低于该列的平均值。在

在numpy中，使用掩码数组是最简单的。在

例如：

import numpy as np

# 10 samples, 3 columns
p = np.random.random((10, 3))

# Calculate the mean of each column
colmeans = p.mean(axis=0)

# Make a boolean array where our condition is True
mask = p < colmeans

# Find the standard deviation of values in each column below the column's mean.
# For masked arrays, the True values will be masked, so we'll invert the array.
stdleft = np.ma.masked_where(~mask, p).std(axis=0)

您也可以使用pandas来实现这一点，正如@SudeepJuvekar所提到的那样。性能应该大体相似，但是熊猫在这个特殊的操作中应该更快一些（未经测试）。在

网友

2楼 · 编辑于 2024-06-29 00:52:06

熊猫是你的朋友。转换pandas数据帧中的矩阵，并对数据帧进行逻辑索引。像这样的东西

mat = pandas.DataFrame(p)

这将从原始numpy矩阵p创建一个数据帧。然后我们计算数据帧的列平均值。在

^{pr2}$

创建n_par大小的数组，包含mat的所有列。最后，使用<逻辑操作索引mat矩阵，并对其应用std。在

stdleft = mat[mat < m].std()

类似于stdright。花几分钟在我的机器上计算。在

这是熊猫的doc页面：http://pandas.pydata.org/

编辑：使用下面的注释进行编辑。您可以使用原始的p进行几乎类似的索引。在

m = p.mean(axis=0)
logical = p < m

logical包含与p大小相同的布尔矩阵。这就是熊猫派上用场的地方。您可以使用相同大小的逻辑直接索引pandas矩阵。在numpy中这样做有点困难。我想循环是实现它的最好方法？在

for i in range(len(p)):
    stdleft[i] = p[logical[:, i], i].std()

相关问题更多 >

编程相关推荐

热门问题

热门文章