Cython的加速没有预期的那么大

2024-07-08 15:37:01 发布

您现在位置:Python中文网/ 问答频道 /正文

我写了一个Python函数,它计算大数目(N~10^3)粒子之间的成对电磁相互作用,并将结果存储在NxN complex128 ndarray中。它运行,但它是一个更大程序中最慢的部分,当N=900[修正]时,大约需要40秒。原始代码如下所示:

import numpy as np
def interaction(s,alpha,kprop): # s is an Nx3 real array 
                                # alpha is complex
                                # kprop is float

    ndipoles = s.shape[0]

    Amat = np.zeros((ndipoles,3, ndipoles, 3), dtype=np.complex128)
    I = np.array([[1,0,0],[0,1,0],[0,0,1]])
    im = complex(0,1)

    k2 = kprop*kprop

    for i in range(ndipoles):
        xi = s[i,:]
        for j in range(ndipoles):
            if i != j:
                xj = s[j,:]
                dx = xi-xj
                R = np.sqrt(dx.dot(dx))
                n = dx/R
                kR = kprop*R
                kR2 = kR*kR
                A = ((1./kR2) - im/kR)
                nxn = np.outer(n, n)
                nxn = (3*A-1)*nxn + (1-A)*I
                nxn *= -alpha*(k2*np.exp(im*kR))/R
            else:
                nxn = I

            Amat[i,:,j,:] = nxn

    return(Amat.reshape((3*ndipoles,3*ndipoles)))

我以前从未使用过Cython,但这似乎是我加快速度的一个好地方,所以我几乎盲目地采用了我在在线教程中找到的技术。我得到了一些加速(30秒vs.40秒),但没有我预期的那么戏剧化,所以我想知道我是做错了什么,还是错过了关键的一步。以下是我对上述例行公事的最佳尝试:

^{pr2}$

Tags: alphaforisnpk2arraykrim
1条回答
网友
1楼 · 发布于 2024-07-08 15:37:01

NumPy的真正威力在于以向量化的方式跨大量元素执行一个操作,而不是在循环中分块使用该操作。在您的例子中,您使用了两个嵌套循环和一个IF条件语句。我建议扩展中间数组的维数,这将使^{}发挥作用,因此可以一次性对所有元素使用相同的操作,而不是循环中的小数据块。在

为了扩展维度,可以使用^{}/^{}。那么,遵循这样一个前提的向量化实现应该是这样的-

def vectorized_interaction(s,alpha,kprop):

    im = complex(0,1)
    I = np.array([[1,0,0],[0,1,0],[0,0,1]])
    k2 = kprop*kprop

    # Vectorized calculations for dx, R, n, kR, A
    sd = s[:,None] - s 
    Rv = np.sqrt((sd**2).sum(2))
    nv = sd/Rv[:,:,None]
    kRv = Rv*kprop
    Av = (1./(kRv*kRv)) - im/kRv

    # Vectorized calculation for: "nxn = np.outer(n, n)"
    nxnv = nv[:,:,:,None]*nv[:,:,None,:]

    # Vectorized calculation for: "(3*A-1)*nxn + (1-A)*I"
    P = (3*Av[:,:,None,None]-1)*nxnv + (1-Av[:,:,None,None])*I

    # Vectorized calculation for: "-alpha*(k2*np.exp(im*kR))/R"    
    multv = -alpha*(k2*np.exp(im*kRv))/Rv

    # Vectorized calculation for: "nxn *= -alpha*(k2*np.exp(im*kR))/R"   
    outv = P*multv[:,:,None,None]


    # Simulate ELSE part of the conditional statement"if i != j:" 
    # with masked setting to I on the last two dimensions
    outv[np.eye((N),dtype=bool)] = I

    return outv.transpose(0,2,1,3).reshape(N*3,-1)

输出测试和运行时验证-

案例1:

^{pr2}$

案例2:

In [707]: N = 100
     ...: s = np.random.rand(N,3) + complex(0,1)*np.random.rand(N,3)
     ...: alpha = 3j
     ...: kprop = 5.4
     ...: 

In [708]: out_org = interaction(s,alpha,kprop)
     ...: out_vect = vectorized_interaction(s,alpha,kprop)
     ...: print np.allclose(np.real(out_org),np.real(out_vect))
     ...: print np.allclose(np.imag(out_org),np.imag(out_vect))
     ...: 
True
True

In [709]: %timeit interaction(s,alpha,kprop)
1 loops, best of 3: 826 ms per loop

In [710]: %timeit vectorized_interaction(s,alpha,kprop)
100 loops, best of 3: 14 ms per loop

案例三:

In [711]: N = 900
     ...: s = np.random.rand(N,3) + complex(0,1)*np.random.rand(N,3)
     ...: alpha = 3j
     ...: kprop = 5.4
     ...: 

In [712]: out_org = interaction(s,alpha,kprop)
     ...: out_vect = vectorized_interaction(s,alpha,kprop)
     ...: print np.allclose(np.real(out_org),np.real(out_vect))
     ...: print np.allclose(np.imag(out_org),np.imag(out_vect))
     ...: 
True
True

In [713]: %timeit interaction(s,alpha,kprop)
1 loops, best of 3: 1min 7s per loop

In [714]: %timeit vectorized_interaction(s,alpha,kprop)
1 loops, best of 3: 1.59 s per loop

相关问题 更多 >

    热门问题