值与一组值的矢量化比较

testAgainst = np.repeat(dfGroups['start'].values[np.newaxis, :], repeats=10, axis=0) array([[ 10, 1000, 3600, 3700, 4000, 4200, 4300, 4700, 5000, 6000, 6200, 7000, 7700, 9000], [ 10, 1000, 3600, 3700, 4000, 4200, 4300, 4700, 5000, 6000, 6200, 7000, 7700, 9000], [ 10, 1000, 3600, 3700, 4000, 4200, 4300, 4700, 5000, 6000, 6200, 7000, 7700, 9000], [ 10, 1000, 3600, 3700, 4000, 4200, 4300, 4700, 5000, 6000, 6200, 7000, 7700, 9000], [ 10, 1000, 3600, 3700, 4000, 4200, 4300, 4700, 5000, 6000, 6200, 7000, 7700, 9000], [ 10, 1000, 3600, 3700, 4000, 4200, 4300, 4700, 5000, 6000, 6200, 7000, 7700, 9000], [ 10, 1000, 3600, 3700, 4000, 4200, 4300, 4700, 5000, 6000, 6200, 7000, 7700, 9000], [ 10, 1000, 3600, 3700, 4000, 4200, 4300, 4700, 5000, 6000, 6200, 7000, 7700, 9000], [ 10, 1000, 3600, 3700, 4000, 4200, 4300, 4700, 5000, 6000, 6200, 7000, 7700, 9000], [ 10, 1000, 3600, 3700, 4000, 4200, 4300, 4700, 5000, 6000, 6200, 7000, 7700, 9000]])

Traceback (most recent call last): File "/home/foo/.conda/envs/myenv3/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2881, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "<ipython-input-28-1bce7761846c>", line 1, in <module> df.iloc[:10]['occ'] < testAgainst File "/home/foo/.conda/envs/myenv3/lib/python3.5/site-packages/pandas/core/ops.py", line 832, in wrapper return self._constructor(na_op(self.values, np.asarray(other)), File "/home/foo/.conda/envs/myenv3/lib/python3.5/site-packages/pandas/core/ops.py", line 792, in na_op result = getattr(x, name)(y) ValueError: operands could not be broadcast together with shapes (10,) (10,14)

1条回答

网友

1楼 · 发布于 2024-06-26 13:36:36

1）广播失败的原因是Series对象形成一个一维标记数组[shape=(10,)]，这与二维数组[shape=(1, 14)]相比。在

让我们考虑一下：ser = df.iloc[:10]['occ']

如果你做了：

>>> ser.iloc[0] < testAgainst
array([[False, False, False, False, False, False,  True,  True,  True,
     True,  True,  True,  True,  True]], dtype=bool)

这意味着，如果您可以将相同的比较应用于序列的所有行，它将给出正确的结果。在

^{pr2}$

但是，这是非常缓慢的，因为它没有矢量化，因此不可能将其应用于大量的行。在

现在您可以做的是重塑序列，以便在其中插入额外的维度。在

这允许NumPy分别匹配序列(10, 1)和数组{}的两个形状，以便通过在各自的维度中配对来比较它们。在

2）更好的解决方案可以是：

>>> pd.Series((ser.values[:, None] < testAgainst).tolist())   # same as ser.values.reshape(-1,1)

结果输出：

0    [False, False, False, False, False, False, Tru...
1    [False, True, True, True, True, True, True, Tr...
2    [False, False, False, False, False, False, Fal...
3    [False, False, False, False, False, False, Fal...
4    [False, True, True, True, True, True, True, Tr...
5    [False, False, False, False, True, True, True,...
6    [False, False, False, False, False, False, Tru...
7    [False, False, False, False, False, False, Fal...
8    [False, False, False, False, False, True, True...
9    [False, False, False, False, False, True, True...
dtype: object

注意：测试数组的一个样本就足够了，您不需要重复这个数组来匹配series对象的形状。在

相关问题更多 >

编程相关推荐

热门问题

热门文章