我有一个二进制分类的问题,我正在训练,我相当成功地通过一个预先训练好的嵌入传递我的数据,然后几个CNN并行,汇集结果,然后使用一个密集层来预测类。但是当我在CNN之后再加一层RNN时,训练完全失败了。代码如下(这是一篇很长的文章)。你知道吗
这是CNN唯一的工作模式。我的输入是长度为100的向量。你知道吗
inputs=L.Input(shape=(100))
embedding=L.Embedding(input_dim=weights.shape[0],\
output_dim=weights.shape[1],\
input_length=100,\
weights=[weights],\
trainable=False)(inputs)
conv3 = L.Conv1D(m, kernel_size=(3))(dropout)
conv4 = L.Conv1D(m, kernel_size=(4))(dropout)
conv5 = L.Conv1D(m, kernel_size=(5))(dropout)
maxpool3 = L.MaxPool1D(pool_size=(100-3+1, ), strides=(1,))(conv3)
maxpool4 = L.MaxPool1D(pool_size=(100-4+1, ), strides=(1,))(conv4)
maxpool5 = L.MaxPool1D(pool_size=(100-5+1, ), strides=(1,))(conv5)
concatenated_tensor = L.Concatenate(axis=1)([maxpool3,maxpool4,maxpool5])
flattened = L.Flatten()(concatenated_tensor)
output = L.Dense(units=1, activation='sigmoid')(flattened)
总结如下:
____________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
====================================================================================================
input_25 (InputLayer) (None, 100) 0
____________________________________________________________________________________________________
embedding_25 (Embedding) (None, 100, 50) 451300 input_25[0][0]
____________________________________________________________________________________________________
dropout_25 (Dropout) (None, 100, 50) 0 embedding_25[0][0]
____________________________________________________________________________________________________
conv1d_73 (Conv1D) (None, 98, 100) 15100 dropout_25[0][0]
____________________________________________________________________________________________________
conv1d_74 (Conv1D) (None, 97, 100) 20100 dropout_25[0][0]
____________________________________________________________________________________________________
conv1d_75 (Conv1D) (None, 96, 100) 25100 dropout_25[0][0]
____________________________________________________________________________________________________
max_pooling1d_73 (MaxPooling1D) (None, 1, 100) 0 conv1d_73[0][0]
____________________________________________________________________________________________________
max_pooling1d_74 (MaxPooling1D) (None, 1, 100) 0 conv1d_74[0][0]
____________________________________________________________________________________________________
max_pooling1d_75 (MaxPooling1D) (None, 1, 100) 0 conv1d_75[0][0]
____________________________________________________________________________________________________
concatenate_25 (Concatenate) (None, 3, 100) 0 max_pooling1d_73[0][0]
max_pooling1d_74[0][0]
max_pooling1d_75[0][0]
____________________________________________________________________________________________________
flatten_25 (Flatten) (None, 300) 0 concatenate_25[0][0]
____________________________________________________________________________________________________
dense_47 (Dense) (None, 1) 301 flatten_25[0][0]
====================================================================================================
正如我在上面所说的,这是相当好的工作,获得良好的准确性后,只有3-4个时代。然而,我的思维过程是cnn识别区域模式,但是如果我也想在给定的输入向量中模拟这些模式在较长距离内如何相互关联,我应该在卷积之后使用一些RNN的风格。所以我试着在卷积后改变MaxPooling1D
层的pool_size
,去掉Flatten
,而是将Concatenate
层传递到RNN。例如
maxpool3 = L.MaxPool1D(pool_size=((50,), strides=(1,))(conv3)
maxpool4 = L.MaxPool1D(pool_size=((50,), strides=(1,))(conv4)
maxpool5 = L.MaxPool1D(pool_size=(49,), strides=(1,))(conv5)
concatenated_tensor = L.Concatenate(axis=1)([maxpool3,maxpool4,maxpool5])
rnn=L.SimpleRNN(75)(concatenated_tensor)
output = L.Dense(units=1, activation='sigmoid')(rnn)
现在的总结是:
max_pooling1d_95 (MaxPooling1D) (None, 50, 100) 0 conv1d_97[0][0]
____________________________________________________________________________________________________
max_pooling1d_96 (MaxPooling1D) (None, 50, 100) 0 conv1d_98[0][0]
____________________________________________________________________________________________________
max_pooling1d_97 (MaxPooling1D) (None, 49, 100) 0 conv1d_99[0][0]
____________________________________________________________________________________________________
concatenate_32 (Concatenate) (None, 149, 100) 0 max_pooling1d_95[0][0]
max_pooling1d_96[0][0]
max_pooling1d_97[0][0]
____________________________________________________________________________________________________
simple_rnn_5 (SimpleRNN) (None, 75) 13200 concatenate_32[0][0]
____________________________________________________________________________________________________
dense_51 (Dense) (None, 1) 76 simple_rnn_5[0][0]
====================================================================================================
当我训练模型时,预测结果完全相同:类[1]与类[0]的比率。我读过几篇文章,人们成功地使用了这个方案,所以很明显我做错了什么,我敢打赌这是一个令人尴尬的愚蠢的错误。有人愿意帮忙诊断吗?你知道吗
您可以尝试的第一件事是沿着要素轴而不是时间轴连接。 基本上可以这样做:
(请注意,必须确保maxpool3、maxpool4和maxpool5的“时间”步数相同,或者maxpool3.shape[1]=maxpool4.shape[1]=maxpool5.shape[1])
第二,使用50个时间步,给LSTM或GRU一个机会,因为它们可以比LSTM更好地捕获更长的时间依赖关系。你知道吗
相关问题 更多 >
编程相关推荐