Record Arrays
Record arrays expose the fields of structured arrays as properties.
The recarray is almost identical to a standard array (which supports
named fields already) The biggest difference is that it can use
attribute-lookup to find the fields and it is constructed using
a record.
class recarray(ndarray):
"""Construct an ndarray that allows field access using attributes.
This constructor can be compared to ``empty``: it creates a new record
array but does not fill it with data.
def __getattribute__(self, attr):
# See if ndarray has this attr, and return it if so. (note that this
# means a field with the same name as an ndarray attr cannot be
# accessed by attribute).
try:
return object.__getattribute__(self, attr)
except AttributeError: # attr must be a fieldname
pass
# look for a field with this name
fielddict = ndarray.__getattribute__(self, 'dtype').fields
try:
res = fielddict[attr][:2]
except (TypeError, KeyError):
raise AttributeError("recarray has no attribute %s" % attr)
obj = self.getfield(*res)
# At this point obj will always be a recarray, since (see
# PyArray_GetField) the type of obj is inherited. Next, if obj.dtype is
# non-structured, convert it to an ndarray. If obj is structured leave
# it as a recarray, but make sure to convert to the same dtype.type (eg
# to preserve numpy.record type if present), since nested structured
# fields do not inherit type.
if obj.dtype.fields:
return obj.view(dtype=(self.dtype.type, obj.dtype.fields))
else:
return obj.view(ndarray)
Collection of utilities to manipulate structured arrays.
Most of these functions were initially implemented by John Hunter for
matplotlib. They have been rewritten and extended for convenience.
简而言之,答案是通常应该使用结构化数组,而不是重新排列,因为结构化数组速度更快,重新排列的唯一优势是允许您编写
arr.x
,而不是arr['x']
,这可以是一个方便的快捷方式,但如果列名与numpy方法/属性冲突,则也容易出错。请参阅@jakevdp的书中的excerpt以获得更详细的解释。他特别指出,简单地访问结构化数组的列比访问重排的列快20到30倍。然而,他的示例使用一个非常小的数据帧,只有4行,并且不执行任何标准操作。
对于大型数据帧上的简单操作,差异可能要小得多,尽管结构化数组仍然更快。例如,这里有一个结构化的记录数组,每个数组有10000行(从@jpp answerhere借用的数据帧创建数组的代码)。
如果我们执行一个标准操作,例如将一列乘以2,对于结构化数组来说大约快50%:
记录/重新排列在
https://github.com/numpy/numpy/blob/master/numpy/core/records.py
本文件中的一些相关引用
recarray
是ndarray
的一个子类(与matrix
和masked arrays
相同)。但请注意,它的构造函数与np.array
不同。更像是np.empty(size, dtype)
。将唯一字段实现为属性行为的关键函数是
__getattribute__
(__getitem__
实现索引):它首先尝试获取一个常规属性,比如
.shape
、.strides
、.data
,以及所有方法(.sum
、.reshape
,等等)。如果失败,则在dtype
字段名中查找名称。所以它实际上只是一个结构化数组,包含一些重新定义的访问方法。我只能说
record array
和recarray
是一样的。另一个文件显示了一些历史
https://github.com/numpy/numpy/blob/master/numpy/lib/recfunctions.py
此文件中的许多函数以以下结尾:
您可以返回一个数组作为
recarray
视图这一事实显示了这个层有多么“薄”。numpy
历史悠久,合并了几个独立的项目。我的印象是recarray
是一个较老的概念,并且结构化数组是当前构建在广义dtype
基础上的实现。recarrays
似乎是为了方便和向后兼容而保留的。但我必须研究github
文件历史,以及任何最近的问题/请求才能确定。相关问题 更多 >
编程相关推荐