我的python多进程代码比serial-on慢

2024-10-02 14:26:47 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试用python和opencv实现一种图像处理技术“局部厚度”。它已经在图像分析软件ImageJ中实现。基本上对于二值图像,算法将

  1. 骨架化任何白色对象(以创建骨架或脊线)
  2. 对于每个骨架/脊点,查找到最近边的距离
  3. 对于此距离内的任何点,将“厚度”值指定为距离,或者如果距离大于现有的“厚度”值,则更新厚度

我想使用多处理实现的部分是3。原始代码是here。在python中,我将所有骨架/脊点划分为块,并将每个卡盘传递给一个进程。所有进程都通过一个存储厚度值的共享数组进行通信。但是,我的多处理代码比串行代码慢,即使对于任何一个只处理部分数据的进程也是如此。你知道吗

import numpy as np
import cv2 as cv
import matplotlib.pylab as plt
from skimage.morphology import medial_axis
from scipy.sparse import coo_matrix
import multiprocessing as mp
import time

def worker(sRidge_shared,iRidge,jRidge,rRidge,w,h,iR_worker,worker):
    print('Job starting for worker',worker)
    start=time.time()
    for iR in iR_worker:
        i = iRidge[iR];
        j = jRidge[iR];
        r = rRidge[iR];
        rSquared = int(r * r + 0.5)
        rInt = int(r)
        if (rInt < r): rInt+=1
        iStart = i - rInt
        if (iStart < 0): iStart = 0
        iStop = i + rInt
        if (iStop >= w): iStop = w - 1
        jStart = j - rInt
        if (jStart < 0): jStart = 0
        jStop = j + rInt
        if (jStop >= h): jStop = h - 1
        for j1 in range(jStart,jStop):
            r1SquaredJ =  (j1 - j) * (j1 - j)
            if (r1SquaredJ <= rSquared):
                for i1 in range(iStart,iStop):
                    r1Squared = r1SquaredJ + (i1 - i) * (i1 - i)
                    if (r1Squared <= rSquared):
                        if (rSquared > sRidge_shared[i1+j1*w]):
                            sRidge_shared[i1+j1*w] = rSquared
    print('Worker',worker,' finished job in ',time.time()-start, 's')



def Ridge_to_localthickness_parallel(ridgeimg):
    w, h = ridgeimg.shape
    M = coo_matrix(ridgeimg)
    nR = M.count_nonzero()
    iRidge = M.row
    jRidge = M.col
    rRidge = M.data
    sRidge = np.zeros((w*h,))
    sRidge_shared = mp.Array('d', sRidge)

    nproc = 10

    p = [mp.Process(target=worker,
                    args=(sRidge_shared,iRidge,jRidge,rRidge,w,h,range(i*nR//nproc,min((i+1)*nR//nproc,nR)),i))
                    for i in range(nproc)]
    for pc in p:
        pc.start()
    for pc in p:
        pc.join()

    a = np.frombuffer(sRidge_shared.get_obj())
    b = a.reshape((h,w))

    return 2*np.sqrt(b)

if __name__ == '__main__':
    mp.freeze_support()
    size = 1024

    img = np.zeros((size,size), np.uint8)
    cv.ellipse(img,(size//2,size//2),(size//3,size//5),0,0,360,255,-1)

    skel, distance = medial_axis(img, return_distance=True)
    dist_on_skel = distance * skel

    start = time.time()
    LT1 = Ridge_to_localthickness_parallel(dist_on_skel)
    print('Multiprocessing elapsed time: ', time.time() - start, 's')

结果如下:

Serial elapsed time:  71.07010626792908 s
Job starting for worker 0
Job starting for worker 1
Job starting for worker 2
Job starting for worker 3
Job starting for worker 4
Job starting for worker 5
Job starting for worker 7
Job starting for worker 6
Job starting for worker 8
Job starting for worker 9
Worker 0  finished job in  167.6777663230896 s
Worker 9  finished job in  181.82518076896667 s
Worker 1  finished job in  211.21311926841736 s
Worker 8  finished job in  211.43014097213745 s
Worker 7  finished job in  235.29852747917175 s
Worker 2  finished job in  241.1481122970581 s
Worker 6  finished job in  242.3452320098877 s
Worker 3  finished job in  247.0727047920227 s
Worker 5  finished job in  245.52154970169067 s
Worker 4  finished job in  246.9776954650879 s
Multiprocessing elapsed time:  256.9716944694519 s
>>>

我在windows机器上运行这个。我没有尝试多线程,因为我不知道如何访问共享数组进行多线程处理。你知道吗

编辑:

我使用了sharedmem和Thread/ThreadPoolExecutor。结果表明,该方法优于多处理方法,但不优于串行处理方法。你知道吗

Serial elapsed time:  67.51724791526794 s
Job starting for worker 0
Job starting for worker 1
Job starting for worker 2
Job starting for worker 3
Job starting for worker 4
Job starting for worker 6
Job starting for worker 5
Job starting for worker 7
Job starting for worker 8
Job starting for worker 9
Job starting for worker 10
Job starting for worker 11
Job starting for worker 12
Job starting for worker 13
Job starting for worker 14
Job starting for worker 15
Job starting for worker 16
Job starting for worker 17
Job starting for worker 18
Job starting for worker 19
Worker 2  finished job in  60.84959959983826 s
Worker 3  finished job in  63.856611013412476 s
Worker 4  finished job in  67.02961277961731 s
Worker 16  finished job in  68.00975942611694 s
Worker 15  finished job in  70.39874267578125 s
Worker 1  finished job in  75.65659618377686 s
Worker 14  finished job in  76.97173047065735 s
Worker 9  finished job in  78.4876492023468 s
Worker 0  finished job in  87.56459546089172 s
Worker 7  finished job in  89.86062669754028 s
Worker 17  finished job in  91.72178316116333 s
Worker 8  finished job in  94.22166323661804 s
Worker 19  finished job in  93.27084946632385 s
Worker 13  finished job in  95.02370047569275 s
Worker 5  finished job in  98.98063397407532 s
Worker 18  finished job in  97.57283663749695 s
Worker 10  finished job in  103.78466653823853 s
Worker 11  finished job in  105.19767212867737 s
Worker 6  finished job in  105.96561932563782 s
Worker 12  finished job in  105.5306978225708 s
Threading elapsed time:  106.97455644607544 s
>>>

Tags: inimportforsizeiftimenpjob
1条回答
网友
1楼 · 发布于 2024-10-02 14:26:47

在多个进程上共享一个阵列会带来巨大的成本。你知道吗

基本上,这就是如何“估计”多重处理的时间:

  • 是时候分享所有的数据了
  • 计算时间(应该比串行计算慢,因为它应该计算得更少)
  • 汇总结果。你知道吗

在这里,我高度怀疑第一步将带来巨大的成本(大阵列)

通常,您可以轻松地对部分代码进行多进程/多线程处理,这些代码可以轻松地分离(不需要完整的数组)

相关问题 更多 >