回答此问题可获得 20 贡献值,回答如果被采纳可获得 50 分。
<p>我正在努力并行运行几个机器学习算法(来自scikit learn),并且我正在使用Process类,在进程之间共享一个变量,以便保存结果。你知道吗</p>
<p>不幸的是,我的代码永远不会结束。会不会是内存问题,因为我运行了10个相当繁重的算法?或者只是速度慢?你知道吗</p>
<p>我试着把整个代码分成两部分(我想这会使它更快),但是,它没有改变任何东西。。。你知道吗</p>
<p>注意,train\u bow和test\u bow只是浮动向量。你知道吗</p>
<pre><code>from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import MultinomialNB, ComplementNB, BernoulliNB
from sklearn.ensemble import GradientBoostingClassifier, AdaBoostClassifier, VotingClassifier, ExtraTreesClassifier
from sklearn.svm import SVC, LinearSVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier as Knn
from sklearn.feature_extraction.text import TfidfVectorizer
#Custom class
from utilities.db_handler import *
from utilities.utils import *
from multiprocessing import Process, Manager
import json
import pickle as pkl
import os
import numpy as np
import pandas as pd
manager = Manager()
return_dict = manager.dict()
# Use a shared variable in order to get the results
proc = []
fncs1 = [random_forest_classification, SVC_classification, LinearSVC_classification, MultinomialNB_classification,
LogisticRegression_classification]
fncs2 = [BernoulliNB_classification, GradientBoosting_classification,
AdaBoost_classification, VotingClassifier_classification, ComplementNB_classification,
ExtrExtraTrees_classification]
# Instantiating 2 set of processes with relative arguments. Each function
# writes the result on result_dict
for fn in fncs1:
p = Process(target=fn, args=(train_bow, test_bow, label_train, label_test, return_dict))
proc.append(p)
p.start()
for p in proc:
p.join()
for fn in fncs2:
p = Process(target=fn, args=(train_bow, test_bow, label_train, label_test, return_dict))
proc.append(p)
p.start()
for p in proc:
p.join()
# then pick te best of the results from return_dict and save them
</code></pre>
<p>这段代码给了我一些属于算法的警告,但是没有显示任何与多处理相关的错误或警告。你知道吗</p>