Keras自动编码器,带网格搜索,用于降维

2024-09-30 22:13:56 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在为高维数据的降维设计一个自动编码器。我已经使用Keras构建了一些简单的自动编码器模型来实现这个目标。但现在我想使用Sklearn的GridSearchCV来自动微调超参数

我用来评估模型的指标是后续t-sne的分类情况。我预计t-sne图上会出现两个聚类,并尝试用不同的分类值(例如疾病与正常、两性等)解释聚类

为此,我尝试了以下代码:

## To create the model
def create_model (optimizer = 'Adam', n_layer = 2, factorArray = [1000, 100], n_bottleneck = 10):
    n_inputs = X.shape[1]
    visible = Input(shape=(n_inputs,))
    e = visible
    i = 0
    j = 0
    for i in range(n_layer):
        e = Dense(factorArray[i])(e)
        e = BatchNormalization()(e)
        e = LeakyReLU()(e)
    bottleneck = Dense(n_bottleneck)(e)
    d = bottleneck
    factorArray = factorArray[::-1]    
    for j in range(n_layer):
        d = Dense(factorArray[j])(d)
        d = BatchNormalization()(d)
        d = LeakyReLU()(d)
    output = Dense(n_inputs, activation='linear')(d)
    model = Model(inputs = visible, outputs = output)
    model = model.compile(optimizer = optimizer, loss ='mse')
    return model

   model = KerasClassifier(build_fn=create_model)
   #model = KerasRegressor(build_fn=create_model)

    # compile autoencoder model 
    
    param_grid = {
              'optimizer': ['Adam'],
              'factorArray': [(1000,100)],
              'n_bottleneck' : [10],
              'epochs':[1,2],
              'batch_size':[128]
             }

    gs = GridSearchCV(
        estimator=model,
        param_grid=param_grid,
        scoring=make_scorer(silhouette_score),
        n_jobs=-1, 
        verbose=2,
        refit = 'false'
    )

    with warnings.catch_warnings(record=True) as w:
        try:
            gs.fit(X, y)  
        except ValueError:
            pass
        print(repr(w[-1].message))
    # fit the autoencoder model to reconstruct input
    history = gs.fit(X_train, X_train)
    # alternative 
    # history = gs.fit(X_train,X_train, validation = (X_test,X_test))
    df = pd.DataFrame(history.cv_results_)
    df.to_csv('Grid_search_table_' + key + ".csv")

程序引发了以下错误:

Traceback (most recent call last):
  File "/mnt/dzl_bioinf/exec/python_lib_keras/keraML_3.7/lib/python3.7/site-packages/sklearn/model_selection/_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/mnt/dzl_bioinf/exec/python_lib_keras/keraML_3.7/lib/python3.7/site-packages/tensorflow/python/keras/wrappers/scikit_learn.py", line 223, in fit
    return super(KerasClassifier, self).fit(x, y, **kwargs)
  File "/mnt/dzl_bioinf/exec/python_lib_keras/keraML_3.7/lib/python3.7/site-packages/tensorflow/python/keras/wrappers/scikit_learn.py", line 159, in fit
    if (losses.is_categorical_crossentropy(self.model.loss) and
AttributeError: 'NoneType' object has no attribute 'loss'

  FitFailedWarning)
/mnt/dzl_bioinf/exec/python_lib_keras/keraML_3.7/lib/python3.7/site-packages/sklearn/model_selection/_validation.py:614: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "/mnt/dzl_bioinf/exec/python_lib_keras/keraML_3.7/lib/python3.7/site-packages/sklearn/model_selection/_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/mnt/dzl_bioinf/exec/python_lib_keras/keraML_3.7/lib/python3.7/site-packages/tensorflow/python/keras/wrappers/scikit_learn.py", line 223, in fit
    return super(KerasClassifier, self).fit(x, y, **kwargs)
  File "/mnt/dzl_bioinf/exec/python_lib_keras/keraML_3.7/lib/python3.7/site-packages/tensorflow/python/keras/wrappers/scikit_learn.py", line 159, in fit
    if (losses.is_categorical_crossentropy(self.model.loss) and
AttributeError: 'NoneType' object has no attribute 'loss'

Traceback (most recent call last):
  File "/var/spool/slurmd/job502071/slurm_script", line 267, in <module>
    history = gs.fit(X_train, X_train)
  File "/mnt/dzl_bioinf/exec/python_lib_keras/keraML_3.7/lib/python3.7/site-packages/sklearn/utils/validation.py", line 63, in inner_f
    return f(*args, **kwargs)
  File "/mnt/dzl_bioinf/exec/python_lib_keras/keraML_3.7/lib/python3.7/site-packages/sklearn/model_selection/_search.py", line 841, in fit
    self._run_search(evaluate_candidates)
  File "/mnt/dzl_bioinf/exec/python_lib_keras/keraML_3.7/lib/python3.7/site-packages/sklearn/model_selection/_search.py", line 1288, in _run_search
    evaluate_candidates(ParameterGrid(self.param_grid))
  File "/mnt/dzl_bioinf/exec/python_lib_keras/keraML_3.7/lib/python3.7/site-packages/sklearn/model_selection/_search.py", line 827, in evaluate_candidates
    _insert_error_scores(out, self.error_score)
  File "/mnt/dzl_bioinf/exec/python_lib_keras/keraML_3.7/lib/python3.7/site-packages/sklearn/model_selection/_validation.py", line 297, in _insert_error_scores
    raise NotFittedError("All estimators failed to fit")
sklearn.exceptions.NotFittedError: All estimators failed to fit

我的问题是:

  1. 有没有人能帮我修复这个bug,或者告诉我如何在像我这样的无监督任务中使用GridSearchCV
  2. 我是否有办法提取瓶颈层,并在运行网格搜索时将其可视化。附件中的手动培训和微调示例:An example

样本数据:

X1  X2  X3  X4  X5  

Label1  11.112964   8.633534    5.706326    8.432861    3.612596

Label5  9.02319 8.250181    5.636544    8.505976    3.488148

Label7  8.539526    7.338118    5.582104    8.626493    3.782542

Label8  9.622165    9.099065    5.855862    8.573742    3.466322

Tags: inpymodellibpackageslinesitefit