下面是从全名列表中创建令牌的代码
import en_core_web_sm
from datetime import datetime
from joblib import Parallel, delayed
from spacy.util import minibatch
from functools import partial
tok_text = [] # OUTPUT for our tokenised corpus
text = ["liam noah", "oliver william", "harper mason", "emma noah", "evelyn ethan", "mia lucas", "amelia benjamin", "isabella james", "sophia mason", "ava elijah"]
nlp = en_core_web_sm.load()
def process_texts(nlp, batch_id, text):
print(f"{datetime.now()} Processing batch {batch_id}")
for doc in nlp.pipe(text):
tok = [t.text for t in doc if(t.is_ascii and not t.is_punct and not t.is_space)]
tok_text.append(tok)
batch_size = 2
if __name__ == '__main__':
print("Creating Parallel batches...")
partitions = minibatch(text, size=batch_size)
executor = Parallel(n_jobs=1) #later will update it to n_jobs=2
do = delayed(partial(process_texts, nlp))
tasks = (do(i, batch) for i, batch in enumerate(partitions))
executor(tasks)
print("Tokens:: ",tok_text)
上述代码生成以下输出(连续批次):
Creating Parallel batches...
2021-01-26 14:20:18.852977 Processing batch 0
2021-01-26 14:20:18.870977 Processing batch 1
2021-01-26 14:20:18.886977 Processing batch 2
2021-01-26 14:20:18.900977 Processing batch 3
2021-01-26 14:20:18.918977 Processing batch 4
Tokens:: [['liam', 'noah'], ['oliver', 'william'], ['harper', 'mason'], ['emma', 'noah'], ['evelyn', 'ethan'], ['mia', 'lucas'], ['amelia', 'benjamin'], ['isabella', 'james'], ['sophia', 'mason'], ['ava', 'elijah']]
当我将行更改为n_jobs=2时
executor = Parallel(n_jobs=2)
它运行,没有错误,但没有完成任何工作
Creating Parallel batches...
Tokens:: [] <== Empty output
我错过了什么
目前没有回答
相关问题 更多 >
编程相关推荐