提高嵌套循环性能

2024-10-01 15:31:44 发布

您现在位置:Python中文网/ 问答频道 /正文

我在Python里做一个氩液体的分子动力学模拟。我有一个稳定的版本在运行,但是它对100多个原子来说运行缓慢。我将瓶颈标识为下面的嵌套for循环。这是一个力的计算放在一个函数里,这个函数是从我的主.py脚本:

def computeForce(currentPositions):
    potentialEnergy = 0
    force = zeros((NUMBER_PARTICLES,3))
    for iParticle in range(0,NUMBER_PARTICLES-1):
        for jParticle in range(iParticle + 1, NUMBER_PARTICLES):
            distance = currentPositions[iParticle] - currentPositions[jParticle]
            distance = distance - BOX_LENGTH * (distance/BOX_LENGTH).round()
            #note: this is so much faster than scipy.dot()
            distanceSquared = distance[0]*distance[0] + distance[1]*distance[1] + distance[2]*distance[2]            
            if distanceSquared < CUT_OFF_RADIUS_SQUARED:
                r2i = 1. / distanceSquared
                r6i = r2i*r2i*r2i
                lennardJones = 48. * r2i * r6i * (r6i - 0.5)
                force[iParticle] += lennardJones*distance
                force[jParticle] -= lennardJones*distance
                potentialEnergy += 4.* r6i * (r6i - 1.) - CUT_OFF_ENERGY
return(force,potentialEnergy)

大写字母中的变量是常量,定义为配置.py文件。”“当前位置”是一个3个粒子数的矩阵。在

我已经用实现了嵌套for循环的一个版本细纹织物,它的灵感来自这个网站:http://www.scipy.org/PerformancePython。在

然而,我不喜欢失去灵活性。我对“矢量化”for循环很感兴趣。我真的不明白这是怎么回事。有人能给我一个提示或者一个好的教程来教我这个吗?在


Tags: 函数版本numberfordistanceforceparticleslennardjones
3条回答

为了完成这篇文章,我添加了我在编织C代码中的实现。请注意,您需要导入weave和转换器才能工作。此外,weave目前只适用于python2.7。再次感谢你的帮助!它的运行速度比矢量化版本快10倍。

from scipy import weave
from scipy.weave import converters
def computeForceC(currentPositions):        
    code = """
    using namespace blitz;
    Array<double,1> distance(3);
    double distanceSquared, r2i, r6i, lennardJones;
    double potentialEnergy = 0.;

    for( int iParticle = 0; iParticle < (NUMBER_PARTICLES - 1); iParticle++){
        for( int jParticle = iParticle + 1; jParticle < NUMBER_PARTICLES; jParticle++){
            distance(0) = currentPositions(iParticle,0)-currentPositions(jParticle,0);
            distance(0) = distance(0) - BOX_LENGTH * round(distance(0)/BOX_LENGTH);
            distance(1) = currentPositions(iParticle,1)-currentPositions(jParticle,1);
            distance(1) = distance(1) - BOX_LENGTH * round(distance(1)/BOX_LENGTH);
            distance(2) = currentPositions(iParticle,2)-currentPositions(jParticle,2);
            distance(2) = distance(2) - BOX_LENGTH * round(distance(2)/BOX_LENGTH);
            distanceSquared = distance(0)*distance(0) + distance(1)*distance(1) + distance(2)*distance(2);
            if(distanceSquared < CUT_OFF_RADIUS_SQUARED){
                r2i = 1./distanceSquared;
                r6i = r2i * r2i * r2i;
                lennardJones = 48. * r2i * r6i * (r6i - 0.5);
                force(iParticle,0) += lennardJones*distance(0);
                force(iParticle,1) += lennardJones*distance(1);
                force(iParticle,2) += lennardJones*distance(2);
                force(jParticle,0) -= lennardJones*distance(0);
                force(jParticle,1) -= lennardJones*distance(1);
                force(jParticle,2) -= lennardJones*distance(2);
                potentialEnergy += 4.* r6i * (r6i - 1.)-CUT_OFF_ENERGY;

                }

            }//end inner for loop
    }//end outer for loop
    return_val = potentialEnergy;

    """
    #args that are passed into weave.inline and created inside computeForce
    #potentialEnergy = 0.
    force = zeros((NUMBER_PARTICLES,3))

    #all args
    arguments = ['currentPositions','force','NUMBER_PARTICLES','CUT_OFF_RADIUS_SQUARED','BOX_LENGTH','CUT_OFF_ENERGY']
    #evaluate stuff in code
    potentialEnergy = weave.inline(code,arguments,type_converters = converters.blitz,compiler = 'gcc')    

    return force, potentialEnergy

下面是我的矢量化版本的代码。对于1000点的数据集,我的代码大约比原始代码快50倍:

In [89]: xyz = 30 * np.random.uniform(size=(1000, 3))

In [90]: %timeit a0, b0 = computeForce(xyz)
1 loops, best of 3: 7.61 s per loop

In [91]: %timeit a, b = computeForceVector(xyz)
10 loops, best of 3: 139 ms per loop

代码:

^{pr2}$

我还检查了代码是否产生相同的结果

用纯python编写类似MD引擎的东西会很慢。我想看看Numba(http://numba.pydata.org/)或Cython(http://cython.org/)。在Cython方面,我用Cython编写了一个简单的Langevin Dynamics引擎,它可以作为一个示例来帮助您开始:

https://bitbucket.org/joshua.adelman/pylangevin-integrator

另一个我很喜欢的选择是使用OpenMM。有一个python包装器,允许您将MD引擎的所有部分组合在一起,实现定制的功能等。它还可以针对GPU设备:

https://simtk.org/home/openmm

不过,总的来说,有很多高度优化的MD代码可用,除非您是为了某种一般的教育目的而这样做,否则从头开始编写自己的代码是没有意义的。一些主要代码,仅举几个例子:

相关问题 更多 >

    热门问题