我目前正在尝试在keras中实现swapout层(如:http://papers.nips.cc/paper/6205-swapout-learning-an-ensemble-of-deep-architectures.pdf)。如果我正确理解了这个概念,比如说,正常致密层的输出就是被操纵的。所以在输出0,x,F(x)和x+F(x)之间随机选择,其中x是输入,F(x)是层将产生的正常输出
因此,我尝试使用稠密层作为基础,并像这样编辑call方法(其他所有内容与稠密层中完全相同):
def call(self, inputs):
output = K.dot(inputs, self.kernel)
if self.use_bias:
output = K.bias_add(output, self.bias, data_format='channels_last')
#--------- edited -----------
theta1 = np.random.randint(2, size=(1, 256))
theta2 = np.random.randint(2, size=(1, 256))
theta1 = theta1.astype(np.float32)
theta2 = theta2.astype(np.float32)
output = tf.add(tf.multiply(theta1, inputs), tf.multiply(theta2, output))
# -----------------------------------
if self.activation is not None:
output = self.activation(output)
return output
所以我只是计算输出,就像在论文中,Y=Theta1*inputs+Theta2*output,其中*是元素相乘。输出的形状似乎与没有此计算时保持不变,但当我尝试在以下模型中运行它时:
import tensorflow as tf
from keras.models import Sequential
from keras.layers import Dense, Flatten
from swapout import Swapout
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
model = Sequential()
model.add(Flatten(input_shape = (28,28)))
model.add(Swapout(256, activation='relu'))
model.add(Dense(10, activation='softmax'))
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit(x_train, y_train, epochs=5)
res = model.evaluate(x_test, y_test, verbose=2)
虽然我得到以下错误:
2019-12-21 22:59:31.924209: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Invalid argument: Incompatible shapes: [32,784] vs. [1,256]
[[{{node swapout_1/Mul}}]]
Traceback (most recent call last):
File "/home/andreas/studium/semester9/vertiefungs/neural.py", line 294, in <module>
runSingleMnistExperimant()
File "/home/andreas/studium/semester9/vertiefungs/neural.py", line 220, in runSingleMnistExperimant
model.fit(x_train, y_train, epochs=5)
File "/home/andreas/studium/my_tensorflow/venv/lib/python3.6/site-packages/keras/engine/training.py", line 1239, in fit
validation_freq=validation_freq)
File "/home/andreas/studium/my_tensorflow/venv/lib/python3.6/site-packages/keras/engine/training_arrays.py", line 196, in fit_loop
outs = fit_function(ins_batch)
File "/home/andreas/studium/my_tensorflow/venv/lib/python3.6/site-packages/tensorflow_core/python/keras/backend.py", line 3740, in __call__
outputs = self._graph_fn(*converted_inputs)
File "/home/andreas/studium/my_tensorflow/venv/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 1081, in __call__
return self._call_impl(args, kwargs)
File "/home/andreas/studium/my_tensorflow/venv/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 1121, in _call_impl
return self._call_flat(args, self.captured_inputs, cancellation_manager)
File "/home/andreas/studium/my_tensorflow/venv/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 1224, in _call_flat
ctx, args, cancellation_manager=cancellation_manager)
File "/home/andreas/studium/my_tensorflow/venv/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 511, in call
ctx=ctx)
File "/home/andreas/studium/my_tensorflow/venv/lib/python3.6/site-packages/tensorflow_core/python/eager/execute.py", line 67, in quick_execute
six.raise_from(core._status_to_exception(e.code, message), None)
File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [32,784] vs. [1,256]
[[node swapout_1/Mul (defined at /my_tensorflow/venv/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1751) ]] [Op:__inference_keras_scratch_graph_761]
Function call stack:
keras_scratch_graph
在计算中我做错了什么?有没有更好/更简单的方法来实现swapout层? 我对Keras和tensorflow很陌生;)
非常感谢你的帮助
更新:我刚刚解决了问题。问题是输入的形状(无,784)。当然,这不能与(1256)向量按元素相乘。另外,我还必须将加法中的一个求和与一个矩阵相乘,使它们具有相同的形状…我选择了零填充
以下是解决方案:
def call(self, inputs):
output = K.dot(inputs, self.kernel)
if self.use_bias:
output = K.bias_add(output, self.bias, data_format='channels_last')
inputLength = K.int_shape(inputs)[-1]
outputLength = K.int_shape(output)[-1]
theta1 = np.random.randint(2, size=(1,inputLength))
theta2 = np.random.randint(2, size=(1,outputLength))
theta1 = theta1.astype(np.float32)
theta2 = theta2.astype(np.float32)
zeroPadding = getZeroPaddingMatrix(inputLength, outputLength)
output = tf.add(tf.matmul(tf.multiply(theta1, inputs), zeroPadding), tf.multiply(theta2, output))
if self.activation is not None:
output = s
return output
零填充矩阵的计算方式如下:
def getZeroPaddingMatrix(inputLength, outputLength):
zeroPadding = np.identity(inputLength)
if(outputLength > inputLength):
zeros = np.zeros((inputLength, outputLength-inputLength))
zeroPadding = np.concatenate((zeroPadding, zeros), axis=1)
elif(outputLength < inputLength):
zeroPadding = zeroPadding[:,(inputLength-outputLength) :]
return zeroPadding.astype(np.float32)
唯一要做的就是添加参数来改变θ1和θ2
目前没有回答
相关问题 更多 >
编程相关推荐