Keras中共享LSTM层的状态持久性

2024-10-04 11:28:22 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图在Keras模型中使用状态为的共享LSTM层,但似乎每次并行使用都会修改内部状态。这引发了两个问题:

  1. 当使用共享LSTM层训练模型并使用stateful=True时,并行使用是否也在训练期间更新相同的状态?你知道吗
  2. 如果我的观察是有效的,是否有一种方法可以使用权重共享LSTMs,以便为每个并行使用独立地存储状态?你知道吗

下面的代码举例说明了共享LSTM的三个序列的问题。将完全输入的预测结果与将预测输入分成两半并将其连续送入网络的结果进行比较。你知道吗

可以观察到的是,a1aFull的前半部分相同,这意味着在第一次预测期间,LSTM的使用实际上与独立态平行。也就是说,z1不受创建z2z3的并行调用的影响。但是a2aFull的后半部分不同,因此并行使用的状态之间存在一些交互作用。你知道吗

我所希望的是,两个片段a1a2的串联将与使用较长的输入序列调用预测的结果相同,但事实似乎并非如此。另一个问题是,当这种交互作用发生在预测中时,它是否也发生在训练中。你知道吗

import keras
import keras.backend as K
import numpy as np

nOut = 3
xShape = (3, 50, 4)
inShape = (xShape[0], None, xShape[2])   
batchInShape = (1, ) + inShape
x = np.random.randn(*xShape)

# construct network
xIn = keras.layers.Input(shape=inShape, batch_shape=batchInShape)

# shared LSTM layer
sharedLSTM = keras.layers.LSTM(units=nOut, stateful=True, return_sequences=True, return_state=False)

# split the input on the first axis
x1 = keras.layers.Lambda(lambda x: x[:,0,:,:])(xIn)
x2 = keras.layers.Lambda(lambda x: x[:,1,:,:])(xIn)
x3 = keras.layers.Lambda(lambda x: x[:,2,:,:])(xIn)

# pass each input through the LSTM
z1 = sharedLSTM(x1)
z2 = sharedLSTM(x2)
z3 = sharedLSTM(x3)

# add a singleton dimension
y1 = keras.layers.Lambda(lambda x: K.expand_dims(x, axis=1))(z1)
y2 = keras.layers.Lambda(lambda x: K.expand_dims(x, axis=1))(z2)
y3 = keras.layers.Lambda(lambda x: K.expand_dims(x, axis=1))(z3)

# combine the outputs
y = keras.layers.Concatenate(axis=1)([y1, y2, y3])

model = keras.models.Model(inputs=xIn, outputs=y)
model.compile(loss='mse', optimizer='adam')
model.summary()

# no need to train, since we're interested only what is happening mechanically

# reset to a known state and predict for full input
model.reset_states()
aFull = model.predict(x[np.newaxis,:,:,:])

# reset to a known state and predict for the same input, but in two pieces
model.reset_states()
a1 = model.predict(x[np.newaxis,:,:xShape[1]//2,:])
a2 = model.predict(x[np.newaxis,:,xShape[1]//2:,:])
# combine the pieces
aSplit = np.concatenate((a1, a2), axis=2)

print('full diff: {}, first half diff: {}, second half diff: {}'.format(str(np.sum(np.abs(aFull - aSplit))), str(np.sum(np.abs(aFull[:,:,:xShape[1]//2,:] - aSplit[:,:,:xShape[1]//2,:]))), str(np.sum(np.abs(aFull[:,:,xShape[1]//2:,:] - aSplit[:,:,xShape[1]//2:,:])))))

更新:Keras使用Tensorflow 1.14和1.15作为后端,观察到上述行为。使用tf2.0运行相同的代码(使用调整后的导入)会更改结果,因此a1不再与aFull的前半部分相同。这仍然可以通过在层实例化中设置stateful=False来实现。你知道吗

这会告诉我,我尝试使用带有共享参数的递归层的方式,但是并行使用自己的状态,实际上不可能像这样。你知道吗

更新2:似乎同样的功能也被其他早期版本遗漏了:closed, unanswered question at Keras' github。你知道吗

作为比较,这里是pytorch中的一个涂鸦(我第一次尝试使用它)实现了一个简单的网络,其中N个并行lstm共享权重,但具有独立的状态。在这种情况下,状态显式存储在列表中,并手动提供给LSTM单元。你知道吗

import torch
import numpy as np

class sharedLSTM(torch.nn.Module):

    def __init__(self, batchSz, nBands, nDims, outDim):
        super(sharedLSTM, self).__init__()
        self.internalLSTM = torch.nn.LSTM(input_size=nDims, hidden_size=outDim, num_layers=1, bias=True, batch_first=True)
        allStates = list()
        for bandIdx in range(nBands):
            h_0 = torch.zeros(1, batchSz, outDim)
            c_0 = torch.zeros(1, batchSz, outDim)
            allStates.append((h_0, c_0))

        self.allStates = allStates            
        self.nBands = nBands

    def forward(self, x):
        allOut = list()
        for dimIdx in range(self.nBands):
            thisSlice = x[:,dimIdx,:,:] # (batchSz, nSteps, nFeats)
            thisState = self.allStates[dimIdx]

            thisY, thisState = self.internalLSTM(thisSlice, thisState) 
            self.allStates[dimIdx] = thisState
            allOut.append(thisY[:,None,:,:]) # => (batchSz, 1, nSteps, nFeats)

        y = torch.cat(allOut, dim=1) # => (batchSz, nDims, nSteps, nFeats)

        return y

    def resetStates(self):
        for bandIdx in range(nBands):
            self.allStates[bandIdx][0][:] = 0.0
            self.allStates[bandIdx][1][:] = 0.0


batchSz = 5
nBands = 3
nFeats = 4
nOutDims = 2
net = sharedLSTM(batchSz, nBands, nFeats, nOutDims)
net = net.float()
print(net)

N = 20
x = torch.from_numpy(np.random.rand(batchSz, nBands, N, nFeats)).float()
x1 = x[:, :, :N//2, :]
x2 = x[:, :, N//2:, :]

aa = net.forward(x)
net.resetStates()
a1 = net.forward(x1)
a2 = net.forward(x2)

print('(with reset) first half abs diff: {}'.format(str(torch.sum(torch.abs(a1 - aa[:,:,:N//2,:])).detach().numpy())))
print('(with reset) second half abs diff: {}'.format(str(torch.sum(torch.abs(a2 - aa[:,:,N//2:,:])).detach().numpy())))

结果:无论是一次性预测还是分段预测,结果都是一样的。你知道吗

我曾尝试在Keras中使用子分类来复制这一点,但没有成功:

import keras
import numpy as np

class sharedLSTM(keras.Model):
    def __init__(self, batchSz, nBands, nDims, outDim):
        super(sharedLSTM, self).__init__()
        self.internalLSTM = keras.layers.LSTM(units=outDim, stateful=True, return_sequences=True, return_state=True)
        self.internalLSTM.build((batchSz, None, nDims))
        self.internalLSTM.reset_states()
        allStates = list()
        allSlicers = list()
        for bandIdx in range(nBands):
            allStates.append(None)
            allSlicers.append(keras.layers.Lambda(lambda x, b: x[:, :, b, :], arguments = {'b' : bandIdx}))

        self.allStates = allStates            
        self.allSlicers = allSlicers
        self.Concat = keras.layers.Lambda(lambda x: keras.backend.concatenate(x, axis=2))

        self.nBands = nBands

    def call(self, x):
        allOut = list()
        for bandIdx in range(self.nBands):
            thisSlice = self.allSlicers[bandIdx]( x )
            thisState = self.allStates[bandIdx]

            thisY, *thisState = self.internalLSTM(thisSlice, initial_state=thisState) 
            self.allStates[bandIdx] = thisState.copy()
            allOut.append(thisY[:,:,None,:]) 

        y = self.Concat( allOut )
        return y

batchSz = 1
nBands = 3
nFeats = 4
nOutDims = 2
N = 20

model = sharedLSTM(batchSz, nBands, nFeats, nOutDims)
model.compile(optimizer='SGD', loss='mae')

x = np.random.rand(batchSz, N, nBands, nFeats)
x1 = x[:, :N//2, :, :]
x2 = x[:, N//2:, :, :]

aa = model.predict(x)

model.reset_states()
a1 = model.predict(x1)
a2 = model.predict(x2)

print('(with reset) first half abs diff: {}'.format(str(np.sum(np.abs(a1 - aa[:,:N//2,:,:])))))
print('(with reset) second half abs diff: {}'.format(str(np.sum(np.abs(a2 - aa[:,N//2:,:,:])))))

如果你现在问“为什么不用手电筒闭嘴?”答案是,假设Keras已经建立了周围的实验框架,改变它将是一个不可忽视的工作量。你知道吗


Tags: selfmodellayersnptorchabskerasreset