我想在一个叫做translate\u dirac\u delta的类中加速我的函数。根据这个演示https://jonasteuwen.github.io/numpy/python/multiprocessing/2017/01/07/multiprocessing-numpy-array.html,我使用多处理来用共享数组填充数组。我为函数调用计算了t1-t0,它的速度似乎是4核的两倍。然而,当我使用unix时间函数时,它实际上慢了一倍。我知道会有一些偷听使用多处理,但我没想到会有这么多。我正在使用的模块ssht是一个cython包装器,它不是公共的,因此无法完成完整的MWE。你知道吗
定时/调用功能
import pyssht as ssht # cython wrapper
def translation(self, flm, pix_i, pix_j):
t0 = time.time()
glm = self.translate_dirac_delta(flm, pix_i, pix_j)
t1 = time.time()
print(t1 - t0)
return glm
def calc_pixel_value(self, ind, pix_i, pix_j):
# create Ylm corresponding to index
ylm_harmonic = np.zeros((self.L * self.L), dtype=complex)
ylm_harmonic[ind] = 1
# convert Ylm from pixel to harmonic space
ylm_pixel = ssht.inverse(ylm_harmonic, self.L, Method=self.method)
# get value at pixel (i, j)
ylm_omega = np.conj(ylm_pixel[pix_i, pix_j])
return ylm_omega
原创
系统0m1.5s
def translate_dirac_delta(self, flm, pix_i, pix_j):
flm_trans = self.complex_translation(flm)
return flm_trans
def complex_translation(self, flm):
for ell in range(self.L):
for m in range(-ell, ell + 1):
ind = ssht.elm2ind(ell, m)
conj_pixel_val = self.calc_pixel_value(ind)
flm[ind] = conj_pixel_val
return flm
平行
系统0m1.5s
def translate_dirac_delta(self, flm, pix_i, pix_j):
# create arrays to store final and intermediate steps
result_r = np.ctypeslib.as_ctypes(np.zeros(flm.shape))
result_i = np.ctypeslib.as_ctypes(np.zeros(flm.shape))
shared_array_r = multiprocessing.sharedctypes.RawArray(
result_r._type_, result_r)
shared_array_i = multiprocessing.sharedctypes.RawArray(
result_i._type_, result_i)
# ensure function declared before multiprocessing pool
global complex_func
def complex_func(ell):
# store real and imag parts separately
tmp_r = np.ctypeslib.as_array(shared_array_r)
tmp_i = np.ctypeslib.as_array(shared_array_i)
# perform translation
for m in range(-ell, ell + 1):
ind = ssht.elm2ind(ell, m)
conj_pixel_val = self.calc_pixel_value(
ind, pix_i, pix_j)
tmp_r[ind] = conj_pixel_val.real
tmp_i[ind] = conj_pixel_val.imag
# initialise pool and apply function
with multiprocessing.Pool() as p:
p.map(complex_func, range(self.L))
# retrieve real and imag components
result_r = np.ctypeslib.as_array(shared_array_r)
result_i = np.ctypeslib.as_array(shared_array_i)
# combine results
return result_r + 1j * result_i
对于给定的进程,user和sys time是进程及其子进程分别执行程序代码和内核调用所花费的累计时间。time函数返回墙时间(real time),这更像是一个闹钟,使您能够测量一个时刻和下一个时刻之间经过的时间。你知道吗
毫不奇怪,多处理解决方案比原始解决方案占用更多的用户时间,因为在父进程和子进程之间复制数据的时间更多。但是,总的来说,您的工作仍然以较小的实时量完成。你知道吗
https://en.wikipedia.org/wiki/Time_%28Unix%29
相关问题 更多 >
编程相关推荐