我有两个np.ndarray
a
是形状(13000, 8, 315000)
和类型uint8
的数组b
是形状(8,)
和类型float32
的数组我想将第二维度(8)上的每个片段乘以b
中相应的元素,然后沿该维度求和(即沿第二轴的点积)。结果将是形状(13000, 315000)
我设计了两种方法:
np.einsum('ijk,j->ik', a, b)
:使用%timeit
它给出49 s ± 12.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
np.dot(a.transpose(0, 2, 1), b)
:使用%timeit
它给出1min 8s ± 3.54 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
有更快的选择吗?你知道吗
np.show_config()
返回:
blas_mkl_info:
NOT AVAILABLE
openblas_lapack_info:
libraries = ['openblas', 'openblas']
language = c
library_dirs = ['/usr/local/lib']
define_macros = [('HAVE_CBLAS', None)]
lapack_mkl_info:
NOT AVAILABLE
openblas_info:
libraries = ['openblas', 'openblas']
language = c
library_dirs = ['/usr/local/lib']
define_macros = [('HAVE_CBLAS', None)]
blis_info:
NOT AVAILABLE
lapack_opt_info:
libraries = ['openblas', 'openblas']
language = c
library_dirs = ['/usr/local/lib']
define_macros = [('HAVE_CBLAS', None)]
blas_opt_info:
libraries = ['openblas', 'openblas']
language = c
library_dirs = ['/usr/local/lib']
define_macros = [('HAVE_CBLAS', None)]
a.flags
:
C_CONTIGUOUS : True
F_CONTIGUOUS : False
OWNDATA : True
WRITEABLE : True
ALIGNED : True
WRITEBACKIFCOPY : False
UPDATEIFCOPY : False
b.flags
:
C_CONTIGUOUS : True
F_CONTIGUOUS : True
OWNDATA : True
WRITEABLE : True
ALIGNED : True
WRITEBACKIFCOPY : False
UPDATEIFCOPY : False
我们可以利用^{} with ^{} module 处理大数据,提高内存效率,从而提高性能-
计时运行示例-
~2.6x
改善。你知道吗创建评估部分的更好(不易出错)和通用方法如下-
相关问题 更多 >
编程相关推荐