导入模块(nltk)会导致多进程挂起

2024-10-02 12:38:27 发布

您现在位置:Python中文网/ 问答频道 /正文

我跟踪了python的多处理问题,一直到模块(nltk)的导入。可复制的(希望如此)代码粘贴在下面。这对我来说毫无意义,有人有什么想法吗?在

from multiprocessing import Pool
import time, requests
#from nltk.corpus import stopwords   # uncomment this and it hangs

def gethtml(key, url):
    r = requests.get(url)
    return r.text

def getnothing(key, url):
    return "nothing"

if __name__ == '__main__':
    pool = Pool(processes=4)
    result = list()
    nruns = 4
    url = 'http://davidchao.typepad.com/webconferencingexpert/2013/08/gartners-magic-quadrant-for-cloud-infrastructure-as-a-service.html'
    for i in range(0,nruns):
#        print gethtml(i,url)
        result.append(pool.apply_async(gethtml, [i,url]))
#        result.append(pool.apply_async(getnothing, [i,url]))
    pool.close()

    # monitor jobs until they complete
    running = nruns
    while running > 0:
        time.sleep(1)
        running = 0
        for run in result:
            if not run.ready(): running += 1
        print "processes still running:",running

    # print results
    for i,run in enumerate(result):
        print i,run.get()[0:40]

注意'getnothing'函数起作用。它是nltk模块导入和请求调用的组合。叹息

^{pr2}$

Tags: 模块runinfromimporturlforresult
1条回答
网友
1楼 · 发布于 2024-10-02 12:38:27

我会将其他有类似问题的解决方案重定向到不使用多处理模块的解决方案:

1)Apache Spark可扩展性/灵活性。然而,这似乎不是python多处理的解决方案。看来pyspark也受到了全局解释器锁的限制?在

2)“gevent”或“twisted”,用于一般python异步处理 http://sdiehl.github.io/gevent-tutorial/

3)异步请求请求 Asynchronous Requests with Python requests

相关问题 更多 >

    热门问题