TypingError:在nopython模式管道中失败(步骤:nopython前端)float32类型的未知属性“shape”

2024-09-29 01:37:54 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在学习使用numba来加速Python中的代码。使用此代码:

from numba import cuda, vectorize
import numpy as np

@cuda.jit(device = True)
def pixel_count(img1,img2):
    count1 = 0
    count2 = 0
    for i in range(img1.shape[0]):
        for j in range(img1.shape[1]):
            if img1[i][j] > 200:
                count1 = count1 + 1
    i = 0; j = 0;
    for i in range(img2.shape[0]):
        for j in range(img2.shape[1]):
            if img2[i][j] > 200:
                count2 = count2 + 1
                         
    return count1, count2


@vectorize(['float32(float32,float32)'], target = 'cuda')
def cint(img1, img2):
    c1, c2 = pixel_count(img1, img2)
    res = c1-c2
    return res

A = np.random.rand(480, 640).astype(np.float32)*255
B = np.random.rand(480, 640).astype(np.float32)*255


res = cint(A,B)

我收到了以下错误:

TypingError: No implementation of function Function(<numba.cuda.compiler.DeviceFunctionTemplate object at 0x00000175A5BE8C70>) found for signature: pixel_count (float32, float32) There are 2 candidate implementations: - Of which 2 did not match due to: Overload in function 'pixel_count ': File: ..........\OneDrive\Desktop<ipython-input-33-9169f440975d>: Line 4. With argument(s): '(float32, float32)': Rejected as the implementation raised a specific error: TypingError: Failed in nopython mode pipeline (step: nopython frontend) Unknown attribute 'shape' of type float32

 File "<ipython-input-33-9169f440975d>", line 8:
 def pixel_count(img1,img2):
     <source elided>
     count2 = 0
     for i in range(img1.shape[0]):
     ^
 
 During: typing of get attribute at <ipython-input-33-9169f440975d> (8)
 
 File "<ipython-input-33-9169f440975d>", line 8:
 def pixel_count(img1,img2):
     <source elided>
     count2 = 0
     for i in range(img1.shape[0]):
     ^

raised from C:\Users\giuli\anaconda3\envs\GPUcomp\lib\site-packages\numba\core\typeinfer.py:1071

During: resolving callee type: Function(<numba.cuda.compiler.DeviceFunctionTemplate object at 0x00000175A5BE8C70>) During: typing of call at (3)

编辑

我使用guvectorize更改了如下代码:

@guvectorize(['(float32[:],float32[:], float32)'], '(), () -> ()',target = 'cuda')
def cint(img1, img2, res):
    c1, c2 = pixel_count(img1, img2)
    res = c1-c2


A = np.random.rand(480, 640).astype(np.float32)*255
B = np.random.rand(480, 640).astype(np.float32)*255


res = cint(A, B)

出现此错误时:

TypingError: No implementation of function Function(<numba.cuda.compiler.DeviceFunctionTemplate object at 0x000001C99C5D42E0>) found for signature: pixel_count (array(float32, 1d, A), array(float32, 1d, A)) There are 2 candidate implementations:

  • Of which 1 did not match due to: Overload in function 'pixel_count ': File: ..........\OneDrive\Desktop<ipython-input-33-5b0a51c1200a>: Line
  1. With argument(s): '(array(float32, 1d, A), array(float32, 1d, A))': Rejected as the implementation raised a specific error: TypingError: Failed in nopython mode pipeline (step: nopython frontend) Internal error at <numba.core.typeinfer.StaticGetItemConstraint object at 0x000001C99DD239D0>. tuple index out of range During: typing of static-get-item at (9) Enable logging at debug level for details. File "", line 9: def pixel_count(img1,img2): for i in range(img1.shape[0]): for j in range(img1.shape[1]): ^

raised from C:\Users\giuli\anaconda3\envs\GPUcomp\lib\site-packages\numba\core\typeinfer.py:1071

  • Of which 1 did not match due to: Overload in function 'pixel_count ': File: ..........\OneDrive\Desktop<ipython-input-33-5b0a51c1200a>: Line
  1. With argument(s): '(array(float32, 1d, A), array(float32, 1d, A))': Rejected as the implementation raised a specific error: TypingError: Failed in nopython mode pipeline (step: nopython frontend) Internal error at <numba.core.typeinfer.StaticGetItemConstraint object at 0x000001C99DD52370>. tuple index out of range During: typing of static-get-item at (9) Enable logging at debug level for details. File "", line 9: def pixel_count(img1,img2): for i in range(img1.shape[0]): for j in range(img1.shape[1]): ^

raised from C:\Users\giuli\anaconda3\envs\GPUcomp\lib\site-packages\numba\core\typeinfer.py:1071

During: resolving callee type: Function(<numba.cuda.compiler.DeviceFunctionTemplate object at 0x000001C99C5D42E0>) During: typing of call at (23)

如何使用cuda.jit和vectorize/guvectorize函数

编辑2

谢谢大家的回复。我们的目标是找出如何用GPU解决这个任务,使用numba。由于矩阵较小,可能CPU中的代码速度更快;感谢您提供的关于并行计算的提示,非常有用。关于如何将此代码移植到GPU,您还有其他建议吗?多谢各位

我以这种方式修改了代码,但它始终返回值0:

from numba import cuda, vectorize, guvectorize
import numpy as np


@cuda.jit(device = True)
def pixel_count(img1,img2):
    count1 = 0
    count2 = 0
    for i in range(img1.shape[0]):
        for j in range(img1.shape[1]):
            if img1[i][j] > 200:
                count1 = count1 + 1
    i = 0; j = 0;
    for i in range(img2.shape[0]):
        for j in range(img2.shape[1]):
            if img2[i][j] > 200:
                count2 = count2 + 1
                         
    return count1, count2

@guvectorize(['(float32[:,:],float32[:,:], int16)'],
             '(n,m), (n,m)-> ()', target = 'cuda')
def cint(img1, img2, res):
    count1, count2 = pixel_count(img1, img2)
    res = count1 - count2

A = np.random.rand(480, 640).astype(np.float32)*255
B = np.random.rand(480, 640).astype(np.float32)*255
res1 = cint(A, B)

Tags: inforcountnprangeatcudaimg1
1条回答
网友
1楼 · 发布于 2024-09-29 01:37:54

不使用CUDA,但这可能会给您一些想法:

纯Numpy(已矢量化):

A = np.random.rand(480, 640).astype(np.float32) * 255
B = np.random.rand(480, 640).astype(np.float32) * 255

%timeit (A > 200).sum() - (B > 200).sum()
478 µs ± 4.06 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

只需将numpy操作包装到JITted函数中:

@nb.njit
def pixel_count_jit(img):
    return (img > 200).sum()

%timeit pixel_count_jit(A) - pixel_count_jit(B)
165 µs ± 13.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

按行与Numba并行:

@nb.njit(parallel=True)
def pixel_count_parallel(img):
    counts = np.empty(img.shape[1], dtype=nb.uint32)
    for i in nb.prange(img.shape[0]):
        counts[i] = (img[i] > 200).sum()
    return counts.sum()

%timeit pixel_count_parallel(A) - pixel_count_parallel(B)
28.5 µs ± 571 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

相关问题 更多 >