我正在学习使用numba来加速Python中的代码。使用此代码:
from numba import cuda, vectorize
import numpy as np
@cuda.jit(device = True)
def pixel_count(img1,img2):
count1 = 0
count2 = 0
for i in range(img1.shape[0]):
for j in range(img1.shape[1]):
if img1[i][j] > 200:
count1 = count1 + 1
i = 0; j = 0;
for i in range(img2.shape[0]):
for j in range(img2.shape[1]):
if img2[i][j] > 200:
count2 = count2 + 1
return count1, count2
@vectorize(['float32(float32,float32)'], target = 'cuda')
def cint(img1, img2):
c1, c2 = pixel_count(img1, img2)
res = c1-c2
return res
A = np.random.rand(480, 640).astype(np.float32)*255
B = np.random.rand(480, 640).astype(np.float32)*255
res = cint(A,B)
我收到了以下错误:
TypingError: No implementation of function Function(<numba.cuda.compiler.DeviceFunctionTemplate object at 0x00000175A5BE8C70>) found for signature: pixel_count (float32, float32) There are 2 candidate implementations: - Of which 2 did not match due to: Overload in function 'pixel_count ': File: ..........\OneDrive\Desktop<ipython-input-33-9169f440975d>: Line 4. With argument(s): '(float32, float32)': Rejected as the implementation raised a specific error: TypingError: Failed in nopython mode pipeline (step: nopython frontend) Unknown attribute 'shape' of type float32
File "<ipython-input-33-9169f440975d>", line 8: def pixel_count(img1,img2): <source elided> count2 = 0 for i in range(img1.shape[0]): ^ During: typing of get attribute at <ipython-input-33-9169f440975d> (8) File "<ipython-input-33-9169f440975d>", line 8: def pixel_count(img1,img2): <source elided> count2 = 0 for i in range(img1.shape[0]): ^
raised from C:\Users\giuli\anaconda3\envs\GPUcomp\lib\site-packages\numba\core\typeinfer.py:1071
During: resolving callee type: Function(<numba.cuda.compiler.DeviceFunctionTemplate object at 0x00000175A5BE8C70>) During: typing of call at (3)
编辑
我使用guvectorize更改了如下代码:
@guvectorize(['(float32[:],float32[:], float32)'], '(), () -> ()',target = 'cuda')
def cint(img1, img2, res):
c1, c2 = pixel_count(img1, img2)
res = c1-c2
A = np.random.rand(480, 640).astype(np.float32)*255
B = np.random.rand(480, 640).astype(np.float32)*255
res = cint(A, B)
出现此错误时:
TypingError: No implementation of function Function(<numba.cuda.compiler.DeviceFunctionTemplate object at 0x000001C99C5D42E0>) found for signature: pixel_count (array(float32, 1d, A), array(float32, 1d, A)) There are 2 candidate implementations:
- Of which 1 did not match due to: Overload in function 'pixel_count ': File: ..........\OneDrive\Desktop<ipython-input-33-5b0a51c1200a>: Line
- With argument(s): '(array(float32, 1d, A), array(float32, 1d, A))': Rejected as the implementation raised a specific error: TypingError: Failed in nopython mode pipeline (step: nopython frontend) Internal error at <numba.core.typeinfer.StaticGetItemConstraint object at 0x000001C99DD239D0>. tuple index out of range During: typing of static-get-item at (9) Enable logging at debug level for details. File "", line 9: def pixel_count(img1,img2): for i in range(img1.shape[0]): for j in range(img1.shape[1]): ^
raised from C:\Users\giuli\anaconda3\envs\GPUcomp\lib\site-packages\numba\core\typeinfer.py:1071
- Of which 1 did not match due to: Overload in function 'pixel_count ': File: ..........\OneDrive\Desktop<ipython-input-33-5b0a51c1200a>: Line
- With argument(s): '(array(float32, 1d, A), array(float32, 1d, A))': Rejected as the implementation raised a specific error: TypingError: Failed in nopython mode pipeline (step: nopython frontend) Internal error at <numba.core.typeinfer.StaticGetItemConstraint object at 0x000001C99DD52370>. tuple index out of range During: typing of static-get-item at (9) Enable logging at debug level for details. File "", line 9: def pixel_count(img1,img2): for i in range(img1.shape[0]): for j in range(img1.shape[1]): ^
raised from C:\Users\giuli\anaconda3\envs\GPUcomp\lib\site-packages\numba\core\typeinfer.py:1071
During: resolving callee type: Function(<numba.cuda.compiler.DeviceFunctionTemplate object at 0x000001C99C5D42E0>) During: typing of call at (23)
如何使用cuda.jit和vectorize/guvectorize函数
编辑2
谢谢大家的回复。我们的目标是找出如何用GPU解决这个任务,使用numba。由于矩阵较小,可能CPU中的代码速度更快;感谢您提供的关于并行计算的提示,非常有用。关于如何将此代码移植到GPU,您还有其他建议吗?多谢各位
我以这种方式修改了代码,但它始终返回值0:
from numba import cuda, vectorize, guvectorize
import numpy as np
@cuda.jit(device = True)
def pixel_count(img1,img2):
count1 = 0
count2 = 0
for i in range(img1.shape[0]):
for j in range(img1.shape[1]):
if img1[i][j] > 200:
count1 = count1 + 1
i = 0; j = 0;
for i in range(img2.shape[0]):
for j in range(img2.shape[1]):
if img2[i][j] > 200:
count2 = count2 + 1
return count1, count2
@guvectorize(['(float32[:,:],float32[:,:], int16)'],
'(n,m), (n,m)-> ()', target = 'cuda')
def cint(img1, img2, res):
count1, count2 = pixel_count(img1, img2)
res = count1 - count2
A = np.random.rand(480, 640).astype(np.float32)*255
B = np.random.rand(480, 640).astype(np.float32)*255
res1 = cint(A, B)
不使用CUDA,但这可能会给您一些想法:
纯Numpy(已矢量化):
只需将numpy操作包装到JITted函数中:
按行与Numba并行:
相关问题 更多 >
编程相关推荐