我正在Fedora29上使用Python2.7。我尝试学习(T>;=1000)弱分类器我决定并行学习这些分类器,然后在聚合它们的结果之后。在运行代码之后,我看到它们实际上不是并行运行的。我的问题是我怎样才能更有效地完成这项任务。在代码中,哪些更改会导致更快的输出
def rand_bootsrap(n, size):
import datetime
import random
random.seed(datetime.datetime.now())
bootstrap = [random.choice(range(n)) for i in range(size)]
return np.asarray(bootstrap)
def train_weak_clf(inputs):
print mp.current_process()
data_PU, data_P, data_U, train_label, NP, NU, H = inputs
n_oob = np.zeros(shape=(NP+NU,))
f_oob = np.zeros(shape=(NP+NU, 2))
## Bootstrap resample
bootstrap_sample_p = rand_bootsrap(NP, H)
bootstrap_sample_u = rand_bootsrap(NU, H)
data_bootstrap = np.concatenate((data_P[bootstrap_sample_p, :], data_U[bootstrap_sample_u, :]), axis=0)
# Train model
model = DecisionTreeClassifier(max_depth=None, max_features=None, criterion='gini')
model.fit(data_bootstrap, train_label)
## Index for the out of the bag (oob) samples
idx_oob = sorted(set(range(NP + NU)) - set(np.unique(bootstrap_sample_p)) - set(np.unique(bootstrap_sample_u + NP)))
f_oob[idx_oob] += model.predict_proba(data_PU[idx_oob])
n_oob[idx_oob] += 1
return f_oob, n_oob
if __name__ == '__main__':
# load data and define value for inputs's element
T = 1000
p = mp.Pool(processes=T)
inputs = [data_PU, data_P, data_U, train_label, NP, NU, H]
result = p.map(train_weak_clf, [inputs for i in range(T)])
p.close()
# get the result of 1000 weak learner and aggregate them and compute F1score
目前没有回答
相关问题 更多 >
编程相关推荐