使用多处理从列表中删除项目

w_list = [[1, 0, 1], [1, 1, 0], [1, 1, 1]] budget = 299 cost = [100, 100, 100] def cost_interior(w): total_cost = 0 for item in range(0, len(w)): if w[item] == 1: total_cost = total_cost + cost[item] if total_cost > budget or total_cost < (0.5 * budget): w_list.remove(w) def remove_unfit(unfit): if unfit is not None: w_list.remove(unfit) if __name__ == "__main__": p = Pool(2) for w in w_list: p.apply_async(cost_interior, args=(w,), callback=remove_unfit) p.close() p.join() print(w_list)

2条回答

网友

1楼 · 编辑于 2024-09-27 21:23:30

通过使用Pool.map(function, iterable)将iterable（w_list在本例中）拆分为多个块，并对每个块应用函数，每个块使用一个线程，您将获得更好的性能

另一个关键的优化是不要重复调用列表上的remove()，因为这是一个非常昂贵的操作。相反，我们可以先存储要删除的索引列表，然后创建一个新列表

我已经测试了以下代码，与单线程相比，它的运行速度似乎要快得多（大约3-4倍）（您可以取消对process_pool = mp.Pool(1)的注释以查看差异）

import multiprocessing as mp

def cost_interior(w):
    budget = 299
    cost = [100, 100, 100]
    total_cost = 0
    for item in range(0, len(w)):
        if w[item] == 1:
            total_cost = total_cost + cost[item]
    if total_cost > budget or total_cost < (0.5 * budget):
        return True
    return False


def main():
    process_pool = mp.Pool(mp.cpu_count())
    #process_pool = mp.Pool(1)
    w_list = [[1, 0, 1], [1, 1, 0], [1, 1, 1]]
    w_list = w_list*1000000
    should_remove = process_pool.map(cost_interior, w_list)
    process_pool.close()
    process_pool.join()
    should_remove_indices = set()
    for i in range(len(w_list)):
        if should_remove[i]:
            should_remove_indices.add(i)
    w_list_new = []
    for i in range(len(w_list)):
        if i not in should_remove_indices:
            w_list_new.append(w_list[i])

if __name__ == "__main__":
    main()

网友

2楼 · 编辑于 2024-09-27 21:23:30

不幸的是，可能没有一个好的方法来做到这一点

python多处理遇到的问题是，它通过创建一个附加进程池来工作。这些进程是原始进程的副本，因此您通常会得到数据的NUM_PROCS副本，每个进程1个。这里有一些警告，但是如果你看到你的内存增加了，很可能是因为你的数据有额外的拷贝

此外，为了让python在进程之间进行通信，它需要序列化参数，将其传递给工作进程，然后将响应序列化回来。在上面的示例中，在worker中进行处理所需的时钟周期非常少。对数据进行pickle和发送可能比实际工作者处理花费的时间更长。如果随着池大小的增加，处理时间没有减少，那么很可能就是这样

您可以尝试以不同的方式分解代码，看看是否可以让某些东西工作，但是，考虑到上面的示例，我认为您不太可能获得加速。有几个不同的池函数可以尝试（我喜欢pool.imap），但它们的基本问题都是一样的

您可以在线阅读有关多处理和全局解释器锁的问题。我发现当子任务需要一段时间时，python多处理非常有用，但对于非常小的任务，开销太大

相关问题更多 >

编程相关推荐

热门问题

热门文章