Python多处理调用多个方法

2024-10-03 21:24:03 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个Python类,它使用多处理池来处理和清理大型数据集。完成大部分清理的方法是“dataCleaner”,它需要调用第二个方法“processObservation”。 我对Python多处理非常陌生,我似乎不知道如何确保在生成新进程时从“cleanData”调用“processObservation”方法。我该怎么做?我倾向于将所有这些方法保留在类中。我怀疑这与“调用”定义有关,但不确定如何适当地修改它。在

def processData(self, dataset, num_procs = mp.cpu_count()):
    dataSize = len(dataset)
    outputDict = dict()
    procs = mp.Pool(processes = num_procs, maxtasksperchild = 1)

    # Generate data chunks for processing.
    chunk = dataSize / num_procs
    dataChunk = [(i, i + chunk) for i in range(0, dataSize, chunk)]
    count = 1
    print 'Number of data chunks %d' %len(dataChunk)
    for i in dataChunk:
        procs.apply_async(self.dataCleaner, args = (dataset[i[0]:i[1]], count, ))
        count += 1
    procs.close()
    procs.join()

def cleanData(self, data, procNumber):
    print 'Spawning new process: %d' %os.getpid()
    tempDict = dict()
    print len(data)
    for obs in data:
        key, value = processObservation(obs)
        tempDict[key] = value
    cPickle.dump(tempDict, open( '../dataMP/cleanedData_' + str(procNumber) + '.p', 'wb'))

def __call__(self, dataset, count):
    return self.cleanData(dataset, count)

Tags: 方法selffordatalendefcountdataset