由于数据集太大,无法一次全部加载。我需要规范化、提取特征并分批训练,我选择iris作为数据集,scikit learn用python验证我的想法。
第一步,我使用standarScaler.particial_fit()
对批处理进行规范化
def batch_normalize(data):
scaler = StandardScaler()
dataset=[]
for i in data:
sc = scaler.partial_fit(i)
for i in data:
dataset.append(scaler.transform(i))
return dataset
第二步,我使用IncrementalPCA.particial_fit()
提取特征
第三步,我使用MLPClassifier.particial_fit()
训练数据
def batch_classify(X_train, X_test, y_train, y_test):
batch_mlp = MLPClassifier(hidden_layer_sizes=(50,10), max_iter=500,
solver='sgd', alpha=1e-4, tol=1e-4, random_state=1,
learning_rate_init=.01)
for i,j in zip(X_train,y_train):
batch_mlp.partial_fit(i, j,[0,1,2])
print("batch Test set score: %f" % batch_mlp.score(X_test, y_test))
下面是我调用上面定义的三个函数的主函数:
def batch(iris,batch_size):
dataset=batch_normalize(list(chunks(iris.data, batch_size)))
dataset=batch_feature_extracrton(dataset)
X_train, X_test, y_train, y_test = train_test_split(dataset, iris.target, test_size=0.2)
batch_data = list(chunks(X_train, batch_size))
batch_label = list(chunks(y_train, batch_size))
batch_classify(batch_data, X_test, batch_label, y_test)
然而,在这种方法中,每一步,包括规范化和特征提取,我都要对所有批次的数据进行两次检查。是否有其他方法来简化流程?(例如,批次可以直接从步骤1转到步骤3)
目前没有回答
相关问题 更多 >
编程相关推荐