用python/numpy加速动态编程

#define P1 10 void optimalcost(double** costin, double** costout){ /* We assume that costout is initially filled with costin's values. */ float a,b,c,prevcost; for(i=0;i<400;i++){ for(j=0;j<400;j++){ a = prevcost+P1; b = costout[i][j-1]+P1; c = costout[i-1][j-1]; costout[i][j] += min(prevcost,min(b,c)); prevcost = costout[i][j]; } } } return;

from numba import autojit, jit import time import numpy as np @autojit def cost(left, right): height,width = left.shape cost = np.zeros((height,width,width)) for row in range(height): for x in range(width): for y in range(width): cost[row,x,y] = abs(left[row,x]-right[row,y]) return cost @autojit def optimalcosts(initcost): costs = zeros_like(initcost) for row in range(height): costs[row,:,:] = optimalcost(initcost[row]) return costs @autojit def optimalcost(cost): width1,width2 = cost.shape P1=10 prevcost = 0.0 M = np.array(cost) for i in range(1,width1): for j in range(1,width2): M[i,j] += min(M[i-1,j-1],prevcost+P1,M[i,j-1]+P1) prevcost = M[i,j] return M prob_size = 400 left = np.random.rand(prob_size,prob_size) right = np.random.rand(prob_size,prob_size) print '---------- Numba Time ----------' t = time.time() c = cost(left,right) optimalcost(c[100]) print time.time()-t print '---------- Native python Time --' t = time.time() c = cost.py_func(left,right) optimalcost.py_func(c[100]) print time.time()-t

2条回答

网友

1楼 · 编辑于 2024-10-01 07:24:53

如果您不难切换到Python的Anaconda发行版，那么您可以尝试使用Numba，对于这种特殊的简单动态算法，这可能会在不让您离开Python的情况下提供大量的加速。在

网友

2楼 · 编辑于 2024-10-01 07:24:53

Numpy通常不擅长迭代作业（尽管它确实有一些常用的迭代函数，如np.cumsum、np.cumprod、np.linalg.*等）。但对于上面的简单任务，如寻找最短路径（或最低能量路径），您可以通过思考同时可以计算的内容来将问题矢量化（同时尽量避免复制：

假设我们正在“行”方向（即水平方向）寻找最短路径，我们可以首先创建算法输入：

# The problem, 300 400*400 matrices
# Create infinitely high boundary so that we dont need to handle indexing "-1"
a = np.random.rand(300, 400, 402).astype('f')
a[:,:,::a.shape[2]-1] = np.inf

然后准备一些我们稍后使用的实用程序数组（创建需要固定的时间）：

^{pr2}$
最后进行计算并计时：
%%timeit for i in np.arange(a.shape[1]-1): A[i].min(2, T) B[i] += T
我的（超旧笔记本电脑）的计时结果是1.78秒，这已经快过3分钟了。我相信你可以通过优化内存布局和对齐（以某种方式）来改进更多（同时坚持numpy）。或者，您可以简单地使用multiprocessing.Pool。它很容易使用，而且这个问题很容易分解成更小的问题（通过在批处理轴上划分）。在

相关问题更多 >

编程相关推荐

热门问题

热门文章