如何在numpy中沿小尺寸优化点积？

补充信息

np.show_config()返回：

blas_mkl_info: NOT AVAILABLE openblas_lapack_info: libraries = ['openblas', 'openblas'] language = c library_dirs = ['/usr/local/lib'] define_macros = [('HAVE_CBLAS', None)] lapack_mkl_info: NOT AVAILABLE openblas_info: libraries = ['openblas', 'openblas'] language = c library_dirs = ['/usr/local/lib'] define_macros = [('HAVE_CBLAS', None)] blis_info: NOT AVAILABLE lapack_opt_info: libraries = ['openblas', 'openblas'] language = c library_dirs = ['/usr/local/lib'] define_macros = [('HAVE_CBLAS', None)] blas_opt_info: libraries = ['openblas', 'openblas'] language = c library_dirs = ['/usr/local/lib'] define_macros = [('HAVE_CBLAS', None)]

a.flags：

C_CONTIGUOUS : True F_CONTIGUOUS : False OWNDATA : True WRITEABLE : True ALIGNED : True WRITEBACKIFCOPY : False UPDATEIFCOPY : False

b.flags：

C_CONTIGUOUS : True F_CONTIGUOUS : True OWNDATA : True WRITEABLE : True ALIGNED : True WRITEBACKIFCOPY : False UPDATEIFCOPY : False

1条回答

网友

1楼 · 发布于 2024-06-01 08:49:12

我们可以利用^{} with ^{} module处理大数据，提高内存效率，从而提高性能-

import numexpr as ne

d = {'a0':a[:,0],'b0':b[0],'a1':a[:,1],'b1':b[1],\
     'a2':a[:,2],'b2':b[2],'a3':a[:,3],'b3':b[3],\
     'a4':a[:,4],'b4':b[4],'a5':a[:,5],'b5':b[5],\
     'a6':a[:,6],'b6':b[6],'a7':a[:,7],'b7':b[7]}
eval_str = 'a0*b0 + a1*b1 + a2*b2 + a3*b3 + a4*b4 + a5*b5 + a6*b6 + a7*b7'
out = ne.evaluate(eval_str,d)

计时运行示例-

In [474]: # Setup with ~10x smaller than posted one, as my system can't handle those
     ...: np.random.seed(0)
     ...: a = np.random.randint(0,9,(1000,8,30000)).astype(np.uint8)
     ...: b = np.random.rand(8).astype(np.float32)

In [478]: %timeit np.einsum('ijk,j->ik', a, b)
1 loop, best of 3: 247 ms per loop

# einsum with optimize flag set as True
In [479]: %timeit np.einsum('ijk,j->ik', a, b, optimize=True)
1 loop, best of 3: 248 ms per loop

In [480]: d = {'a0':a[:,0],'b0':b[0],'a1':a[:,1],'b1':b[1],\
     ...:      'a2':a[:,2],'b2':b[2],'a3':a[:,3],'b3':b[3],\
     ...:      'a4':a[:,4],'b4':b[4],'a5':a[:,5],'b5':b[5],\
     ...:      'a6':a[:,6],'b6':b[6],'a7':a[:,7],'b7':b[7]}

In [481]: eval_str = 'a0*b0 + a1*b1 + a2*b2 + a3*b3 + a4*b4 + a5*b5 + a6*b6 + a7*b7'

In [482]: %timeit ne.evaluate(eval_str,d)
10 loops, best of 3: 94.3 ms per loop

~2.6x改善。你知道吗

创建评估部分的更好（不易出错）和通用方法如下-

d = {'a'+str(i):a[:,i] for i in range(8)}
d.update({'b'+str(i):b[i] for i in range(8)})
eval_str = ' + '.join(['a'+str(i)+'*'+'b'+str(i) for i in range(8)])

补充信息

相关问题更多 >

编程相关推荐

热门问题

热门文章