背景:我正在尝试创建一个简单的引导函数,用于替换采样方法。我希望将函数并行化,因为我最终将在具有数百万个数据点的数据上部署该函数,并且希望样本量更大。我运行了其他示例,例如Mandelbrot示例。在下面的代码中,您将看到我有一个CPU版本的代码,它运行良好
我已经阅读了几篇参考资料来启动并运行它:
问题:这是我第一次尝试CUDA编程,我相信我已经正确设置了所有内容。我遇到了一个我似乎无法理解的错误:
TypingError: cannot determine Numba type of <class 'object'>
我相信有关的LOC是:
bootstrap_rand_gpu[threads_per_block, blocks_per_grid](rng_states, dt_arry_device, n_samp, out_mean_gpu)
尝试解决此问题:我不会详细介绍,但以下是尝试
我想这可能与cuda.to_device()有关。我改变了它,还调用了cuda.to\u device\u array\u like()。对于所有参数,我都使用了_device(),并且只使用了一些参数。我见过代码示例,其中它用于每个参数,有时不用于。所以我不确定该怎么办
我已经删除了GPU的随机数生成器(create_xoroshiro128p_states),只使用了一个静态值进行测试
使用int()显式赋值整数(而不是)。我不知道我为什么要这样做。我读到Numba只支持有限的数据类型,所以我确保它们是int
为混乱的代码道歉。我对此有点不知所措。
Below is the full code:
import numpy as np
from numpy import random
from numpy.random import randn
import pandas as pd
from timeit import default_timer as timer
from numba import cuda
from numba.cuda.random import create_xoroshiro128p_states, xoroshiro128p_uniform_float32
from numba import *
def bootstrap_rand_cpu(dt_arry, n_samp, boot_samp, out_mean):
for i in range(boot_samp):
rand_idx = random.randint(n_samp-1,size=(50)) #get random array of indices 0-49, with replacement
out_mean[i] = dt_arry[rand_idx].mean()
@cuda.jit
def bootstrap_rand_gpu(rng_states, dt_arry, n_samp, out_mean):
thread_id = cuda.grid(1)
stride = cuda.gridsize(1)
for i in range(thread_id, dt_arry.shape[0], stride):
for k in range(0,n_samp-1,1):
rand_idx_arry[k] = int(xoroshiro128p_uniform_float32(rng_states, thread_id) * 49)
out_mean[thread_id] = dt_arry[rand_idx_arry].mean()
mean = 10
rand_fluc = 3
n_samp = int(50)
boot_samp = int(1000)
dt_arry = (random.rand(n_samp)-.5)*rand_fluc + mean
out_mean_cpu = np.empty(boot_samp)
out_mean_gpu = np.empty(boot_samp)
##################
# RUN ON CPU
##################
start = timer()
bootstrap_rand_cpu(dt_arry, n_samp, boot_samp, out_mean_cpu)
dt = timer() - start
print("CPU Bootstrap mean of " + str(boot_samp) + " mean samples: " + str(out_mean_cpu.mean()))
print("Bootstrap CPU in %f s" % dt)
##################
# RUN ON GPU
##################
threads_per_block = 64
blocks_per_grid = 24
#create random state for each state in the array
rng_states = create_xoroshiro128p_states(threads_per_block * blocks_per_grid, seed=1)
start = timer()
dt_arry_device = cuda.to_device(dt_arry)
out_mean_gpu_device = cuda.to_device(out_mean_gpu)
bootstrap_rand_gpu[threads_per_block, blocks_per_grid](rng_states, dt_arry_device, n_samp, out_mean_gpu_device)
out_mean_gpu_device.copy_to_host()
dt = timer() - start
print("GPU Bootstrap mean of " + str(boot_samp) + " mean samples: " + str(out_mean_gpu.mean()))
print("Bootstrap GPU in %f s" % dt)
您似乎至少有4个问题:
rand_idx_arry
是未定义的李>.mean()
dt_array.shape[0]
是50,因此您只填充gpu输出阵列中的前50个位置。与宿主代码一样,这个网格跨步循环的范围应该是输出数组的大小(即boot_samp
)可能还有其他问题,但当我像这样重构代码以解决这些问题时,它似乎运行正常:
注:
相关问题 更多 >
编程相关推荐