这是一个Python/Numpy bug还是一个微妙的问题？

import numpy as np # Change the following line to True to show different behaviour NEEDS_BUGS = False # Changeme # Create some data data = np.linspace(0, 1, 10) print(data) # Create an array of vector functions each of which does a different operation on a set of data vfuncd = dict() # Two implementations if NEEDS_BUGS: # Lets do this in a loop because we like loops - However WARNING this does not work!! for n in range(10): vfuncd[n] = np.vectorize(lambda x: x * n) else: # Unwrap the loop - NOTE: Spoiler - this works vfuncd[0] = np.vectorize(lambda x: x * 0) vfuncd[1] = np.vectorize(lambda x: x * 1) vfuncd[2] = np.vectorize(lambda x: x * 2) vfuncd[3] = np.vectorize(lambda x: x * 3) vfuncd[4] = np.vectorize(lambda x: x * 4) vfuncd[5] = np.vectorize(lambda x: x * 5) vfuncd[6] = np.vectorize(lambda x: x * 6) vfuncd[7] = np.vectorize(lambda x: x * 7) vfuncd[8] = np.vectorize(lambda x: x * 8) vfuncd[9] = np.vectorize(lambda x: x * 9) # Prove we have multiple different vectorised functions for k, vfunc in vfuncd.items(): print(k, vfunc) # Do the work res = {k: vfuncd[k](data) for k in vfuncd.keys()} # Show the result for k, r in res.items(): print(k, r)

3条回答

网友

1楼 · 编辑于 2024-10-03 09:14:02

我不知道你到底想实现什么，也不知道这是不是一个坏主意（就np.vectorize），但你面临的问题是因为the way python makes closures。引述对相关问题的回答：

Scoping in Python is lexical. A closure will always remember the name and scope of the variable, not the object it's pointing to. Since all the functions in your example are created in the same scope and use the same variable name, they always refer to the same variable.

换句话说，当您在n上进行闭包时，实际上并没有关闭n的状态，只是关闭了名称。因此，当n更改时，闭包中的值也会更改。这对我来说很意外，但是others find it natural

下面是一个使用partial的修复程序：

from functools import partial
.
.
.

def func(x, n):
    return x * n

for n in range(10):
    vfuncd[n] = np.vectorize(partial(func, n=n))

或其他使用工厂方法的

def func_factory(n):
    return lambda x: x * n

for n in range(10):
    vfuncd[n] = np.vectorize(func_factory(n))

网友

2楼 · 编辑于 2024-10-03 09:14:02

In [13]: data = np.linspace(0,1,11)

由于data数组可以与简单的

In [14]: data*3                                                                         
Out[14]: array([0. , 0.3, 0.6, 0.9, 1.2, 1.5, 1.8, 2.1, 2.4, 2.7, 3. ])

我们不需要复杂的np.vectorize来查看关闭问题。一个简单的lambda就足够了

In [15]: vfuncd = {} 
    ...: for n in range(3): 
    ...:     vfuncd[n] = lambda x:x*n 
    ...:                                                                                
In [16]: vfuncd                                                                         
Out[16]: 
{0: <function __main__.<lambda>(x)>,
 1: <function __main__.<lambda>(x)>,
 2: <function __main__.<lambda>(x)>}
In [17]: {k:v(data) for k,v in vfuncd.items()}                                          
Out[17]: 
{0: array([0. , 0.2, 0.4, 0.6, 0.8, 1. , 1.2, 1.4, 1.6, 1.8, 2. ]),
 1: array([0. , 0.2, 0.4, 0.6, 0.8, 1. , 1.2, 1.4, 1.6, 1.8, 2. ]),
 2: array([0. , 0.2, 0.4, 0.6, 0.8, 1. , 1.2, 1.4, 1.6, 1.8, 2. ])}

如果我们使用适当的numpy“矢量化”，我们就不会遇到闭包问题：

In [18]: data * np.arange(3)[:,None]                                                    
Out[18]: 
array([[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ],
       [0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ],
       [0. , 0.2, 0.4, 0.6, 0.8, 1. , 1.2, 1.4, 1.6, 1.8, 2. ]])

或者一个简单的迭代就是我们需要一个字典：

In [20]: {k:data*k for k in range(3)}                                                   
Out[20]: 
{0: array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]),
 1: array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ]),
 2: array([0. , 0.2, 0.4, 0.6, 0.8, 1. , 1.2, 1.4, 1.6, 1.8, 2. ])}

np.vectorize有一个速度免责声明。但是，如果函数只接受标量输入，并且我们想要numpy广播的灵活性，也就是说，对于2个或更多参数，这是合理的

创建多个vectorize显然是一种“反模式”。我宁愿看到一个带有适当参数的vectorize：

In [25]: f = np.vectorize(lambda x,n: x*n)                                              
In [26]: {n: f(data,n) for n in range(3)}                                               
Out[26]: 
{0: array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]),
 1: array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ]),
 2: array([0. , 0.2, 0.4, 0.6, 0.8, 1. , 1.2, 1.4, 1.6, 1.8, 2. ])}

f也可以生成数组Out[18]（但速度较慢）：

In [27]: f(data, np.arange(3)[:,None])                                                  
Out[27]: 
array([[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ],
       [0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ],
       [0. , 0.2, 0.4, 0.6, 0.8, 1. , 1.2, 1.4, 1.6, 1.8, 2. ]])

网友

3楼 · 编辑于 2024-10-03 09:14:02

python变量n似乎绑定到向量化表达式：

for n in range(10):
    vfuncd[n] = np.vectorize(lambda x: x * n)

这会在创建要绑定的新对象时修复它：

for n in range(10):
    vfuncd[n] = np.vectorize(lambda x: x * np.scalar(n))

事实上，这对性能有影响，因为我假设必须重复获取python变量的值

相关问题更多 >

编程相关推荐

热门问题

热门文章