<p>一种方法是使用<code>Pool</code>和<code>Queue</code>。你知道吗</p>
<p>当队列中有项目时,池将继续工作,而不保留主线程。你知道吗</p>
<p>选择以下导入之一:</p>
<pre><code>import multiprocessing as mp (for process based parallelization)
import multiprocessing.dummy as mp (for thread based parallelization)
</code></pre>
<p>创建工人、池和队列:</p>
<pre><code>the_queue = mp.Queue() #store the account ids and page lists here
def worker_main(queue):
while waiting == True:
while not queue.empty():
account, pageList = queue.get(True) #get an id from the queue
pull_data(pageList, account)
waiting = True
the_pool = mp.Pool(num_parallel_workers, worker_main,(the_queue,))
# don't forget the coma here ^
accountIDs = [100,101,103]
thread_count = 3
for account in accountIDs:
list_of_page_lists = get_page_list(account, thread_count)
for pg_list in page_list:
the_queue.put((account, pg_list))
....
waiting = False #while you don't do this, the pool will probably never end.
#not sure if it's a good practice, but you might want to have
#the pool hanging there for a while to receive more items
the_pool.close()
the_pool.join()
</code></pre>
<hr/>
<p>另一个选项是先填充队列,然后创建池,仅当队列中有项目时才使用辅助进程。你知道吗</p>
<p>然后,如果有更多数据到达,则创建另一个队列、另一个池:</p>
<pre><code>import multiprocessing.dummy as mp
#if you are not using dummy, you will probably need a queue for the results too
#as the processes will not access the vars from the main thread
#something like worker_main(input_queue, output_queue):
#and pull_data(pageList,account,output_queue)
#and mp.Pool(num_parallel_workers, worker_main,(in_queue,out_queue))
#and you get the results from the output queue after pool.join()
the_queue = mp.Queue() #store the account ids and page lists here
def worker_main(queue):
while not queue.empty():
account, pageList = queue.get(True) #get an id from the queue
pull_data(pageList, account)
accountIDs = [100,101,103]
thread_count = 3
for account in accountIDs:
list_of_page_lists = get_page_list(account, thread_count)
for pg_list in page_list:
the_queue.put((account, pg_list))
the_pool = mp.Pool(num_parallel_workers, worker_main,(the_queue,))
# don't forget the coma here ^
the_pool.close()
the_pool.join()
del the_queue
del the_pool
</code></pre>