提高pythonloop计算效率的方法问题的回答

提高pythonloop计算效率的方法

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

我想加速以下与球面模式相关的代码。这是对我实际代码的简化（我不想过分简化它，因为它可能导致对我的实际问题无效的解决方案）： <pre><code>import numpy as np import time import math def function_call(npp,nmax): matrix_a = np.random.rand(npp) matrix_b = np.random.rand(npp) a=np.random.rand() F = np.zeros((2*npp, 2*nmax*(nmax+2)),dtype=np.complex_) npa=np.arange(npp) for n in range(1,nmax+1,1): a_n = np.sqrt(1 / (2 * np.pi * n * (n + 1))) for m in range(-n,n+1,1): b_m = (-1)**((np.abs(m) + m) / 2) p_mn = int(1 / 2 * (np.abs(m) + n + 1 / 2 * (1 - (-1)**(np.abs(m) + n)))) alpha_mn = np.sqrt(((2 * n + 1) / 2) * math.factorial(n - np.abs(m)) / math.factorial(n + np.abs(m))) A_mn = np.zeros(npp) B_mn = np.zeros(npp) for p in range(p_mn,n+1,1): Cai_pmn = math.factorial(n) * ((-1)**(n + p)) / (math.factorial(p) * math.factorial(n - p)) * math.factorial(2 * p)/math.factorial(2 * p - np.abs(m) - n) A_mn = A_mn + Cai_pmn * (np.cos(matrix_a))**(2 * p - np.abs(m) - n) B_mn = B_mn + (2 * p - np.abs(m) - n) * Cai_pmn * (np.cos(matrix_a))**(np.abs(2 * p - np.abs(m) - n - 1)) A_mn = A_mn / (2**n * math.factorial(n)) B_mn = B_mn / (2**n * math.factorial(n)) S_mn = alpha_mn * m * A_mn * np.sin(matrix_a)**np.abs(np.abs(m) - 1) D_mn = alpha_mn * (np.abs(m) * A_mn * np.cos(matrix_a) * (np.sin(matrix_a))**(np.abs(np.abs(m) - 1)) - B_mn * (np.sin(matrix_a))**(np.abs(m) + 1)) h1 = 1j**(n+1)*np.exp(-1j*a)/(a) h2 = 1j**(n)*np.exp(-1j*a)/(a) F_s1_theta = 1j * a_n * b_m * h1 * (S_mn * np.exp(1j * m * matrix_b)) F_s1_phi = -a_n * b_m * h1 * (D_mn * np.exp(1j * m * matrix_b)) F_s2_theta = a_n * b_m * h2 * (D_mn * np.exp(1j * m * matrix_b)) F_s2_phi = 1j * a_n * b_m * h2 * (S_mn * np.exp(1j * m * matrix_b)) j = 2 * (n * (n + 1) + m - 1) F[2 * npa, j] = F_s1_theta[npa] F[2 * npa+1 , j] = F_s1_phi[npa] j = 2 * (n * (n + 1) + m - 1) + 1 F[2 * npa, j] = F_s2_theta[npa] F[2 * npa+1, j] = F_s2_phi[npa] prev_time_ep =time.time() npp=500 nmax=80 function_call(npp,nmax) print(" --- %s seconds ---" % (time.time() - prev_time_ep)) </code></pre> 我尝试的第一个选择是将其矢量化（这花费了我一些时间，因为它并不明显）。然而，内存消耗增长迅速，效率低下 我也尝试过使用Numba，事实上，我在上一次成功地减少了4，但如果可能的话，我一直在寻找更大的改进 我也读过，也许多处理或Cython是不错的选择。也许有一种方法可以将其矢量化，而不必快速增加内存使用量

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

我对您的代码做了一些工作，这里是基准测试。瓶颈在于阶乘计算 <pre><code> ================== PerfTool ================== task |aver(s) |sum(s) |count |std main loop | 0.134| 10.712| 80| 0.101 +-second loop | 0.134| 10.712| 80| 0.101 +-A | 0.000| 0.245| 6560| 0.000 +-B | 0.001| 5.648| 6560| 0.001 +-C | 0.000| 0.541| 6560| 0.000 +-D | 0.000| 1.505| 6560| 0.000 +-E | 0.000| 1.769| 6560| 0.000 +-F | 0.000| 0.867| 6560| 0.000 mx creation | 0.000| 0.000| 1| 0.000 preparation | 0.000| 0.000| 1| 0.000 overall | 0.03| 10.71| 39522|- </code></pre> B和C哨兵是： <pre><code> with PerfTool('B'): for p in range(p_mn,n+1,1): Cai_pmn = math.factorial(n) * ((-1)**(n + p)) / (math.factorial(p) * math.factorial(n - p)) * math.factorial(2 * p)/math.factorial(2 * p - np.abs(m) - n) A_mn = A_mn + Cai_pmn * (np.cos(matrix_a))**(2 * p - np.abs(m) - n) B_mn = B_mn + (2 * p - np.abs(m) - n) * Cai_pmn * (np.cos(matrix_a))**(np.abs(2 * p - np.abs(m) - n - 1)) with PerfTool('C'): A_mn = A_mn / (2**n * math.factorial(n)) B_mn = B_mn / (2**n * math.factorial(n)) </code></pre> 正如您所看到的，大部分时间都花在B上，因此我添加了一种缓存，如下所示： <pre><code> rng = np.arange(1,nmax+1,1) cache = dict(zip(rng,factorial(rng))) def get_factorial(w,cache=cache): if w not in cache: cache[w] = math.factorial(w) return cache[w] </code></pre> 要使用而不是math.factorial，可以避免重新计算相同的值 最后，B被重构为B_-vec，这就是邪恶的根源！我已经将代码标记为B_vec_slow，2行占用了大部分时间 <pre><code> with PerfTool('B_vec'): prng = np.arange(p_mn, n+1) Cai_pmn_vec = get_factorial(n) * ((-1)**(n + prng)) / (factorial(prng) * factorial(n - prng)) * factorial(2 * prng)/factorial(2 * prng - np.abs(m) - n) with PerfTool('B_vec_slow'): A_mn_vec = Cai_pmn_vec*np.power(cos_matrix_a[:,np.newaxis],2 * prng - np.abs(m) - n) B_mn_vec = (2 * prng - np.abs(m) - n) * Cai_pmn_vec * np.power(cos_matrix_a[:,np.newaxis], np.abs(2 * prng - np.abs(m) - n - 1)) A_mn = np.sum(A_mn_vec,axis=1) B_mn = np.sum(B_mn_vec,axis=1) </code></pre> 结果是： <pre><code>================== PerfTool ================== task |aver(s) |sum(s) |count |std main loop | 0.072| 5.736| 80| 0.052 +-second loop | 0.072| 5.735| 80| 0.052 +-A | 0.000| 0.194| 6560| 0.000 +-B_vec | 0.001| 3.490| 6560| 0.000 +-B_vec_slow | 0.000| 2.987| 6560| 0.000 +-C | 0.000| 0.126| 6560| 0.000 +-D | 0.000| 0.536| 6560| 0.000 +-E | 0.000| 0.768| 6560| 0.000 +-F | 0.000| 0.522| 6560| 0.000 preparation | 0.000| 0.000| 1| 0.000 mx creation | 0.000| 0.000| 1| 0.000 overall | 0.01| 5.74| 46082|- </code></pre> 如果你能在这两条线上工作，你可以期望在2/3秒内运行 这里：优化的代码：<a href="https://www.codepile.net/pile/8oDyGp6Q" rel="nofollow noreferrer">https://www.codepile.net/pile/8oDyGp6Q</a>

提高pythonloop计算效率的方法

1 个回答

相关Python问题