python生成器和并发性

2024-09-29 19:03:59 发布

您现在位置：Python中文网/ 问答频道 /正文

9593

网友

男 | 程序猿一只，喜欢编程写python代码。

我有一个从NLTK/spaCy返回文件名和NLP任务的生成器。我想并行执行此任务（针对每个文档），即并行处理k个文档。在

如何在python中执行此操作？多处理包还是asyncIo才是正确的选择？还是皮卡？在

返回的结果将被添加到一个反向索引中，如下例所示。在

def gen_items():
    print("Yield 0")
    yield (0, 'Text 0')
    print("Yield 1")
    yield (1, 'text 1')
    print("Yield 2")
    yield (2, 'Text 2')

gen1, gen2 = itertools.tee(gen_items())
ids = (id_ for (id_, text) in gen1)
texts = (text for (id_, text) in gen2)
docs = nlp.pipe(texts, batch_size=50, n_threads=4)
d = {}
for id_, doc in zip(ids, docs):
    print('id ' + str(id_))
    for token in doc:
       # print('token ' + str(token) + ' orth ' + token.orth_)
        if token.is_alpha and not token.is_stop and len(token.orth_) > 1:
            strtok = token.orth_.strip()
            if strtok not in d.keys():
                d[strtok] = {id_}
            elif strtok in d.keys():
                d[strtok].add(id_)

Tags： text in 文档 token id for items gen

0条回答

目前没有回答

python生成器和并发性

相关问题更多 >

编程相关推荐

热门问题

热门文章

python生成器和并发性

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >