图像分类网格搜索中的内存问题

2024-10-03 09:15:52 发布

您现在位置:Python中文网/ 问答频道 /正文

我在VGG16模型上使用分层洗牌分割实现网格搜索CV,我正在将数据流从dataframe转换为numpy数组,以便能够将它们传递到网格搜索CV,但我有内存问题,因为我正在对10000张图像进行培训。如何解决这个问题,或者有任何其他解决方案使用K折叠并在5次折叠和每次训练和拟合时执行超参数调整

这是我的函数,train_数据是一个数据帧,BuildModel是vgg16模型

Y = train_data[['label']]

data = ImageDataGenerator(preprocessing_function = preprocess_input)
data_generator = data.flow_from_dataframe(train_data, directory = path,
                x_col = "filename", y_col = "label",
                class_mode = "binary", target_size=(224, 224), batch_size = len(train_data))
  
model = KerasClassifier(build_fn = buildModel, verbose=0)
  
batch_size = [10, 20, 40, 60, 80, 100]
epochs = [10, 50, 100]
optimizer = ['SGD', 'RMSprop', 'Adagrad', 'Adadelta', 'Adam', 'Adamax', 'Nadam']

param_grid = dict(batch_size=batch_size, epochs=epochs,optimizer=optimizer)

x = data_generator.next()[0]
print(x.shape)
y = data_generator.next()[1]
print(y.shape)

stratifiedSplit = StratifiedShuffleSplit(n_splits=5, test_size=0.3)

kfold_splits = 5
grid = GridSearchCV(estimator=model,  
                n_jobs=-1, 
                verbose=1,
                return_train_score=True,
                cv=stratifiedSplit,
                param_grid=param_grid,)

grid_result = grid.fit(x, y, ) #callbacks=[tbCallBack]

cross_val_score(grid, x, y, cv=stratifiedSplit)

print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

Tags: datasizeparambatchtrainparamsresultgenerator