numpy-ndarray散列性

>>> import numpy as np >>> class Vector(np.ndarray): ... pass >>> nparray = np.array([0.]) >>> vector = Vector(shape=(1,), buffer=nparray) >>> ndarray = np.ndarray(shape=(1,), buffer=nparray) >>> nparray array([ 0.]) >>> ndarray array([ 0.]) >>> vector Vector([ 0.]) >>> '__hash__' in dir(nparray) True >>> '__hash__' in dir(ndarray) True >>> '__hash__' in dir(vector) True >>> hash(nparray) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unhashable type: 'numpy.ndarray' >>> hash(ndarray) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unhashable type: 'numpy.ndarray' >>> hash(vector) -9223372036586049780 >>> nparray.__hash__() 269709177 >>> ndarray.__hash__() 269702147 >>> vector.__hash__() -9223372036586049780 >>> id(nparray) 4315346832 >>> id(ndarray) 4315234352 >>> id(vector) 4299616456 >>> nparray.__hash__() == id(nparray) False >>> ndarray.__hash__() == id(ndarray) False >>> vector.__hash__() == id(vector) False >>> hash(vector) == vector.__hash__() True

>>> [Vector(shape=(1,), buffer=np.array([1], dtype=mytype), dtype=mytype) for mytype in ('float', 'int', 'float128')] [Vector([ 1.]), Vector([1]), Vector([ 1.0], dtype=float128)] >>> [id(Vector(shape=(1,), buffer=np.array([1], dtype=mytype), dtype=mytype)) for mytype in ('float', 'int', 'float128')] [4317742576, 4317742576, 4317742576] >>> [hash(Vector(shape=(1,), buffer=np.array([1], dtype=mytype), dtype=mytype)) for mytype in ('float', 'int', 'float128')] [269858911, 269858911, 269858911]

2条回答

网友
1楼 · 编辑于 2024-05-03 11:26:10

我在Python 2.6.6和numpy 1.3.0中得到了相同的结果。根据the Python glossary，如果定义了__hash__（而不是None），并且定义了__eq__或__cmp__，那么对象应该是散列的。ndarray.__eq__和ndarray.__hash__都被定义并返回有意义的内容，所以我不明白hash为什么会失败。在快速的google之后，我找到了this post on the python.scientific.devel mailing list，它声明数组从来没有打算是散列的-所以为什么定义ndarray.__hash__，我不知道。注意isinstance(nparray, collections.Hashable)返回True。
编辑：注意nparray.__hash__()返回的结果与id(nparray)相同，因此这只是默认实现。也许在早期版本的python中删除__hash__的实现是困难的或不可能的（2.6中显然引入了__hash__ = None技术），所以他们使用某种C-API魔法来实现这一点，这种方法不会传播到子类，也不会阻止您显式调用ndarray.__hash__？
Python 3.2.2和当前的numpy 2.0.0与repo有所不同。__cmp__方法不再存在，因此哈希性现在需要__hash__和__eq__（请参见Python 3 glossary）。在这个版本的numpy中，ndarray.__hash__是被定义的，但是它只是None，所以不能被调用。hash(nparray)失败，isinstance(nparray, collections.Hashable)按预期返回False。hash(vector)也失败了。

网友
2楼 · 编辑于 2024-05-03 11:26:10

这不是一个明确的答案，但这里有一些轨道来理解这种行为。
我这里指的是1.6.1版本的numpy代码。
根据numpy.ndarray对象实现（看，numpy/core/src/multiarray/arrayobject.c），将hash方法设置为NULL。
NPY_NO_EXPORT PyTypeObject PyArray_Type = { #if defined(NPY_PY3K) PyVarObject_HEAD_INIT(NULL, 0) #else PyObject_HEAD_INIT(NULL) 0, /* ob_size */ #endif "numpy.ndarray", /* tp_name */ sizeof(PyArrayObject), /* tp_basicsize */ &array_as_mapping, /* tp_as_mapping */ (hashfunc)0, /* tp_hash */
此tp_hash属性似乎在numpy/core/src/multiarray/multiarraymodule.c中被重写。请参见DUAL_INHERIT、DUAL_INHERIT2和initmultiarray函数，其中tp_hash属性被修改。
例如： pyarraydesrc_Type.tp_hash=PyArray_DescrHash
根据hashdescr.c，hash实现如下：
* How does this work ? The hash is computed from a list which contains all the * information specific to a type. The hard work is to build the list * (_array_descr_walk). The list is built as follows: * * If the dtype is builtin (no fields, no subarray), then the list * contains 6 items which uniquely define one dtype (_array_descr_builtin) * * If the dtype is a compound array, one walk on each field. For each * field, we append title, names, offset to the final list used for * hashing, and then append the list recursively built for each * corresponding dtype (_array_descr_walk_fields) * * If the dtype is a subarray, one adds the shape tuple to the list, and * then append the list recursively built for each corresponding type * (_array_descr_walk_subarray)

相关问题更多 >

编程相关推荐

热门问题

热门文章