导入模块（nltk）会导致多进程挂起

from multiprocessing import Pool import time, requests #from nltk.corpus import stopwords # uncomment this and it hangs def gethtml(key, url): r = requests.get(url) return r.text def getnothing(key, url): return "nothing" if __name__ == '__main__': pool = Pool(processes=4) result = list() nruns = 4 url = 'http://davidchao.typepad.com/webconferencingexpert/2013/08/gartners-magic-quadrant-for-cloud-infrastructure-as-a-service.html' for i in range(0,nruns): # print gethtml(i,url) result.append(pool.apply_async(gethtml, [i,url])) # result.append(pool.apply_async(getnothing, [i,url])) pool.close() # monitor jobs until they complete running = nruns while running > 0: time.sleep(1) running = 0 for run in result: if not run.ready(): running += 1 print "processes still running:",running # print results for i,run in enumerate(result): print i,run.get()[0:40]

1条回答

网友

1楼 · 发布于 2024-10-02 12:38:27

我会将其他有类似问题的解决方案重定向到不使用多处理模块的解决方案：

1）Apache Spark可扩展性/灵活性。然而，这似乎不是python多处理的解决方案。看来pyspark也受到了全局解释器锁的限制？在

2）“gevent”或“twisted”，用于一般python异步处理 http://sdiehl.github.io/gevent-tutorial/

3）异步请求请求 Asynchronous Requests with Python requests

相关问题更多 >

编程相关推荐

热门问题

热门文章