批规范化、批特征提取和批训练方法

2024-09-30 16:29:39 发布

您现在位置:Python中文网/ 问答频道 /正文

由于数据集太大,无法一次全部加载。我需要规范化、提取特征并分批训练,我选择iris作为数据集,scikit learn用python验证我的想法。 第一步,我使用standarScaler.particial_fit()对批处理进行规范化

def batch_normalize(data):
    scaler = StandardScaler()
    dataset=[]
    for i in data:
        sc = scaler.partial_fit(i)
    for i in data:
        dataset.append(scaler.transform(i))

    return dataset

第二步,我使用IncrementalPCA.particial_fit()提取特征

^{pr2}$

第三步,我使用MLPClassifier.particial_fit()训练数据

def batch_classify(X_train, X_test, y_train, y_test):
    batch_mlp = MLPClassifier(hidden_layer_sizes=(50,10), max_iter=500,
                    solver='sgd', alpha=1e-4,  tol=1e-4, random_state=1,
                    learning_rate_init=.01)
    for i,j in zip(X_train,y_train):
        batch_mlp.partial_fit(i, j,[0,1,2])
    print("batch Test set score: %f" % batch_mlp.score(X_test, y_test))

下面是我调用上面定义的三个函数的主函数:

def batch(iris,batch_size):
    dataset=batch_normalize(list(chunks(iris.data, batch_size)))
    dataset=batch_feature_extracrton(dataset)
    X_train, X_test, y_train, y_test = train_test_split(dataset, iris.target, test_size=0.2)
    batch_data = list(chunks(X_train, batch_size))
    batch_label = list(chunks(y_train, batch_size))
    batch_classify(batch_data, X_test, batch_label, y_test)

然而,在这种方法中,每一步,包括规范化和特征提取,我都要对所有批次的数据进行两次检查。是否有其他方法来简化流程?(例如,批次可以直接从步骤1转到步骤3)


Tags: 数据intestirisfordatasizedef