Python joblib Parallel n_jobs=1正在工作。n_jobs=2不工作。没有错误。空输出。nmslib模型管

2024-09-24 08:33:47 发布

您现在位置:Python中文网/ 问答频道 /正文

下面是从全名列表中创建令牌的代码

import en_core_web_sm
from datetime import datetime
from joblib import Parallel, delayed
from spacy.util import minibatch
from functools import partial

tok_text = [] # OUTPUT for our tokenised corpus
text = ["liam noah", "oliver william", "harper mason", "emma noah", "evelyn ethan", "mia lucas", "amelia benjamin", "isabella james", "sophia mason", "ava elijah"]

nlp = en_core_web_sm.load()

def process_texts(nlp, batch_id, text):
    print(f"{datetime.now()} Processing batch {batch_id}")
    for doc in nlp.pipe(text):
        tok = [t.text for t in doc if(t.is_ascii and not t.is_punct and not t.is_space)]
        tok_text.append(tok)

batch_size = 2

if __name__ == '__main__':
    print("Creating Parallel batches...")
    partitions = minibatch(text, size=batch_size)
    executor = Parallel(n_jobs=1)              #later will update it to n_jobs=2
    do = delayed(partial(process_texts, nlp))
    tasks = (do(i, batch) for i, batch in enumerate(partitions))
    executor(tasks)
    
print("Tokens:: ",tok_text)

上述代码生成以下输出(连续批次):

Creating Parallel batches...
2021-01-26 14:20:18.852977 Processing batch 0
2021-01-26 14:20:18.870977 Processing batch 1
2021-01-26 14:20:18.886977 Processing batch 2
2021-01-26 14:20:18.900977 Processing batch 3
2021-01-26 14:20:18.918977 Processing batch 4
Tokens::  [['liam', 'noah'], ['oliver', 'william'], ['harper', 'mason'], ['emma', 'noah'], ['evelyn', 'ethan'], ['mia', 'lucas'], ['amelia', 'benjamin'], ['isabella', 'james'], ['sophia', 'mason'], ['ava', 'elijah']]

当我将行更改为n_jobs=2时

executor = Parallel(n_jobs=2)

它运行,没有错误,但没有完成任何工作

Creating Parallel batches...
Tokens::  []  <== Empty output

我错过了什么


Tags: textinfromimportfordatetimenlpparallel