提高pythonloop计算效率的方法

3条回答

网友

1楼 · 编辑于 2024-09-30 22:12:13

我不确定哪些循环比Python中的其他循环快，但正如我在代码中看到的，有一些重复的函数调用（相同的参数）

例如：

A_mn = A_mn / (2**n * math.factorial(n))
B_mn = B_mn / (2**n * math.factorial(n))

两个声明共享相同的分母。首先尝试计算阶乘，将其保存到变量中并将其用作分母

此外，还会多次调用np.exp（）函数，所有函数都使用相同的参数：

np.exp(-1j*a)/(a)
np.exp(1j * m * matrix_b)

Numpy指数函数的执行时间可能会有点慢

所有np.sin/np.cos（matrix_a）调用都会发生同样的情况

这些更改不会给您带来巨大的改进，但它是：

def function_call_2(npp,nmax):
    matrix_a = np.random.rand(npp)
    matrix_b = np.random.rand(npp)
    a=np.random.rand()
    F = np.zeros((2*npp, 2*nmax*(nmax+2)),dtype=np.complex_) 
    npa=np.arange(npp)

    # Auxiliary variables
    sin_matrix_a = np.sin(matrix_a)
    cos_matrix_a = np.cos(matrix_a)

    for n in range(1,nmax+1,1):

        # Auxilary variables
        denominator = (2**n * math.factorial(n))

        a_n = np.sqrt(1 / (2 * np.pi * n * (n + 1)))
        for m in range(-n,n+1,1):        
            b_m = (-1)**((np.abs(m) + m) / 2)
            p_mn = int(1 / 2 * (np.abs(m) + n + 1 / 2 * (1 - (-1)**(np.abs(m) + n))))
            alpha_mn = np.sqrt(((2 * n + 1) / 2) * math.factorial(n - np.abs(m)) / math.factorial(n + np.abs(m)))
            A_mn = np.zeros(npp)
            B_mn = np.zeros(npp)
            for p in range(p_mn,n+1,1):
                Cai_pmn = math.factorial(n) * ((-1)**(n + p)) / (math.factorial(p) * math.factorial(n - p)) * math.factorial(2 * p)/math.factorial(2 * p - np.abs(m) - n)
                A_mn = A_mn + Cai_pmn * (cos_matrix_a)**(2 * p - np.abs(m) - n)
                B_mn = B_mn + (2 * p - np.abs(m) - n) * Cai_pmn * (cos_matrix_a)**(np.abs(2 * p - np.abs(m) - n - 1))
            A_mn = A_mn / denominator
            B_mn = B_mn / denominator
            S_mn = alpha_mn * m * A_mn * sin_matrix_a**np.abs(np.abs(m) - 1)
            D_mn = alpha_mn * (np.abs(m) * A_mn * cos_matrix_a * (sin_matrix_a)**(np.abs(np.abs(m) - 1)) - B_mn * (sin_matrix_a)**(np.abs(m) + 1))

            # Auxilary variables
            np_exp_1 = np.exp(-1j*a)/(a)

            np_exp_2 = np.exp(1j * m * matrix_b)

            h1 = 1j**(n+1)*np_exp_1
            h2 = 1j**(n)*np_exp_1

            F_s1_theta = 1j * a_n * b_m * h1 * (S_mn * np_exp_2)
            F_s1_phi =       -a_n * b_m * h1 * (D_mn * np_exp_2)
            F_s2_theta =      a_n * b_m * h2 * (D_mn * np_exp_2)
            F_s2_phi =   1j * a_n * b_m * h2 * (S_mn * np_exp_2)
            j = 2 * (n * (n + 1) + m - 1)
            F[2 * npa, j] = F_s1_theta[npa]
            F[2 * npa+1  , j] = F_s1_phi[npa]
            j = 2 * (n * (n + 1) + m - 1) + 1
            F[2 * npa, j] = F_s2_theta[npa]
            F[2 * npa+1, j] = F_s2_phi[npa]

我做了一个简单的测试，执行速度加快了约3秒：

prev_time_ep =time.time()
npp=500
nmax=80 
function_call(npp,nmax)           
print(" - %s seconds  -" % (time.time() - prev_time_ep))

prev_time_ep =time.time()
function_call_2(npp,nmax)           
print(" - %s seconds  -" % (time.time() - prev_time_ep))

输出：

 - 16.99950885772705 seconds  -
 - 14.231853580474854 seconds  -

网友

2楼 · 编辑于 2024-09-30 22:12:13

我对您的代码做了一些工作，这里是基准测试。瓶颈在于阶乘计算

    ================== PerfTool ==================
task                     |aver(s) |sum(s)  |count   |std     
main loop                |   0.134|  10.712|      80|   0.101
  +-second loop          |   0.134|  10.712|      80|   0.101
    +-A                  |   0.000|   0.245|    6560|   0.000
    +-B                  |   0.001|   5.648|    6560|   0.001
    +-C                  |   0.000|   0.541|    6560|   0.000
    +-D                  |   0.000|   1.505|    6560|   0.000
    +-E                  |   0.000|   1.769|    6560|   0.000
    +-F                  |   0.000|   0.867|    6560|   0.000
mx creation              |   0.000|   0.000|       1|   0.000
preparation              |   0.000|   0.000|       1|   0.000

overall                  |    0.03|   10.71|   39522|-

B和C哨兵是：

      with PerfTool('B'):
                for p in range(p_mn,n+1,1):
                    Cai_pmn = math.factorial(n) * ((-1)**(n + p)) / (math.factorial(p) * math.factorial(n - p)) * math.factorial(2 * p)/math.factorial(2 * p - np.abs(m) - n)
                    A_mn = A_mn + Cai_pmn * (np.cos(matrix_a))**(2 * p - np.abs(m) - n)
                    B_mn = B_mn + (2 * p - np.abs(m) - n) * Cai_pmn * (np.cos(matrix_a))**(np.abs(2 * p - np.abs(m) - n - 1))

      with PerfTool('C'):
                A_mn = A_mn / (2**n * math.factorial(n))
                B_mn = B_mn / (2**n * math.factorial(n))

正如您所看到的，大部分时间都花在B上，因此我添加了一种缓存，如下所示：

   rng = np.arange(1,nmax+1,1)
    cache = dict(zip(rng,factorial(rng)))
    def get_factorial(w,cache=cache):
        if w not in cache:
            cache[w] = math.factorial(w)
        return cache[w]

要使用而不是math.factorial，可以避免重新计算相同的值

最后，B被重构为B_-vec，这就是邪恶的根源！我已经将代码标记为B_vec_slow，2行占用了大部分时间

            with PerfTool('B_vec'):
                prng = np.arange(p_mn, n+1)
                Cai_pmn_vec = get_factorial(n) * ((-1)**(n + prng)) / (factorial(prng) * factorial(n - prng)) * factorial(2 * prng)/factorial(2 * prng - np.abs(m) - n)
                with PerfTool('B_vec_slow'):
                    A_mn_vec = Cai_pmn_vec*np.power(cos_matrix_a[:,np.newaxis],2 * prng - np.abs(m) - n)
                    B_mn_vec = (2 * prng - np.abs(m) - n) * Cai_pmn_vec * np.power(cos_matrix_a[:,np.newaxis], np.abs(2 * prng - np.abs(m) - n - 1))
                A_mn = np.sum(A_mn_vec,axis=1)
                B_mn = np.sum(B_mn_vec,axis=1)

结果是：

================== PerfTool ==================
task                     |aver(s) |sum(s)  |count   |std     
main loop                |   0.072|   5.736|      80|   0.052
  +-second loop          |   0.072|   5.735|      80|   0.052
    +-A                  |   0.000|   0.194|    6560|   0.000
    +-B_vec              |   0.001|   3.490|    6560|   0.000
      +-B_vec_slow       |   0.000|   2.987|    6560|   0.000
    +-C                  |   0.000|   0.126|    6560|   0.000
    +-D                  |   0.000|   0.536|    6560|   0.000
    +-E                  |   0.000|   0.768|    6560|   0.000
    +-F                  |   0.000|   0.522|    6560|   0.000
preparation              |   0.000|   0.000|       1|   0.000
mx creation              |   0.000|   0.000|       1|   0.000

overall                  |    0.01|    5.74|   46082|-

如果你能在这两条线上工作，你可以期望在2/3秒内运行

这里：优化的代码：https://www.codepile.net/pile/8oDyGp6Q

网友

3楼 · 编辑于 2024-09-30 22:12:13

首先，您的代码具有O（n^3）复杂性，这不是什么大问题

很多代码都可以使用array programming来完成，这将大大加快速度，特别是使用numpy

我建议使用探查器查找不执行的代码，并开始在vectrial of loops中重写代码

我写了一个工具perf_tools对这类作品很有用。它可以为您提供一种性能驱动的解决方案

相关问题更多 >

编程相关推荐

热门问题

热门文章