如何提高并行部件的效率

2024-10-05 17:44:30 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在Fedora29上使用Python2.7。我尝试学习(T>;=1000)弱分类器我决定并行学习这些分类器,然后在聚合它们的结果之后。在运行代码之后,我看到它们实际上不是并行运行的。我的问题是我怎样才能更有效地完成这项任务。在代码中,哪些更改会导致更快的输出

在弱分类器中,我调用这个函数来进行重采样

def rand_bootsrap(n, size):
  import datetime
  import random
  random.seed(datetime.datetime.now())
  bootstrap = [random.choice(range(n)) for i in range(size)]
  return np.asarray(bootstrap)

在这里我训练一个弱分类器

def train_weak_clf(inputs):
  print mp.current_process()
  data_PU, data_P, data_U, train_label, NP, NU, H = inputs
  n_oob = np.zeros(shape=(NP+NU,))
  f_oob = np.zeros(shape=(NP+NU, 2))

  ## Bootstrap resample
  bootstrap_sample_p = rand_bootsrap(NP, H)
  bootstrap_sample_u = rand_bootsrap(NU, H)

  data_bootstrap = np.concatenate((data_P[bootstrap_sample_p, :],              data_U[bootstrap_sample_u, :]), axis=0)

  # Train model
  model = DecisionTreeClassifier(max_depth=None, max_features=None, criterion='gini')
  model.fit(data_bootstrap, train_label)

  ## Index for the out of the bag (oob) samples
  idx_oob = sorted(set(range(NP + NU)) - set(np.unique(bootstrap_sample_p)) - set(np.unique(bootstrap_sample_u + NP))) 
  f_oob[idx_oob] += model.predict_proba(data_PU[idx_oob])
  n_oob[idx_oob] += 1
  return f_oob, n_oob

主要

 if __name__ == '__main__':

   # load data and define value for inputs's element 
   T = 1000
   p = mp.Pool(processes=T)
   inputs = [data_PU, data_P, data_U, train_label, NP, NU, H] 
   result = p.map(train_weak_clf, [inputs for i in range(T)])
   p.close()

   # get the result of 1000 weak learner and aggregate them and compute F1score

Tags: samplefordatamodel分类器nprangetrain