多层感知器的实现问题

2024-10-01 17:31:48 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图创建一个多层感知器,用于对从MNIST数据库获得的手绘数字数据集进行分类。它实现了2个隐藏层,具有sigmoid激活功能,而输出层使用SoftMax。然而,无论出于什么原因,我都无法让它发挥作用。我在下面的代码中附加了培训循环,我相信这就是问题的根源。有人能确定我的感知器实现可能存在的问题吗

    def train(self, inputs, targets, eta, niterations):
        """
        inputs is a numpy array of shape (num_train, D) containing the training images
                    consisting of num_train samples each of dimension D.

        targets is a numpy array of shape (num_train, D) containing the training labels
                    consisting of num_train samples each of dimension D.

        eta is the learning rate for optimization 
        niterations is the number of iterations for updating the weights 

        """
        ndata = np.shape(inputs)[0]  # number of data samples
        # adding the bias
        inputs = np.concatenate((inputs, -np.ones((ndata, 1))), axis=1)

        # numpy array to store the update weights
        updatew1 = np.zeros((np.shape(self.weights1)))
        updatew2 = np.zeros((np.shape(self.weights2)))
        updatew3 = np.zeros((np.shape(self.weights3)))

        for n in range(niterations):

            # forward phase
            self.outputs = self.forwardPass(inputs)

            # Error using the sum-of-squares error function
            error = 0.5*np.sum((self.outputs-targets)**2)

            if (np.mod(n, 100) == 0):
                print("Iteration: ", n, " Error: ", error)

            # backward phase
            deltao = self.outputs - targets
            placeholder = np.zeros(np.shape(self.outputs))
            for j in range(np.shape(self.outputs)[1]):
                y = self.outputs[:, j]
                placeholder[:, j] = y * (1 - y)
                for y in range(np.shape(self.outputs)[1]):
                    if not y == j:
                        placeholder[:, j] += -y * self.outputs[:, y]
            deltao *= placeholder
            # compute the derivative of the second hidden layer
            deltah2 = np.dot(deltao, np.transpose(self.weights3))
            deltah2 = self.hidden2*self.beta*(1.0-self.hidden2)*deltah2
            # compute the derivative of the first hidden layer
            deltah1 = np.dot(deltah2[:, :-1], np.transpose(self.weights2))
            deltah1 = self.hidden1*self.beta*(1.0-self.hidden1)*deltah1
            # update the weights of the three layers: self.weights1, self.weights2 and self.weights3
            updatew1 = eta*(np.dot(np.transpose(inputs),deltah1[:, :-1])) + (self.momentum * updatew1)
            updatew2 = eta*(np.dot(np.transpose(self.hidden1),deltah2[:, :-1])) + (self.momentum * updatew2)
            updatew3 = eta*(np.dot(np.transpose(self.hidden2),deltao)) + (self.momentum * updatew3)

            self.weights1 -= updatew1
            self.weights2 -= updatew2
            self.weights3 -= updatew3

    def forwardPass(self, inputs):
        """
            inputs is a numpy array of shape (num_train, D) containing the training images
                    consisting of num_train samples each of dimension D.  
        """
        # layer 1
        # the forward pass on the first hidden layer with the sigmoid function
        self.hidden1 = np.dot(inputs, self.weights1)
        self.hidden1 = 1.0/(1.0+np.exp(-self.beta*self.hidden1))
        self.hidden1 = np.concatenate((self.hidden1, -np.ones((np.shape(self.hidden1)[0], 1))), axis=1)
        # layer 2
        # the forward pass on the second hidden layer with the sigmoid function
        self.hidden2 = np.dot(self.hidden1, self.weights2)
        self.hidden2 = 1.0/(1.0+np.exp(-self.beta*self.hidden2))
        self.hidden2 = np.concatenate((self.hidden2, -np.ones((np.shape(self.hidden2)[0], 1))), axis=1)

        # output layer
        # the forward pass on the output layer with softmax function
        outputs = np.dot(self.hidden2, self.weights3)
        outputs = np.exp(outputs)
        outputs /= np.repeat(np.sum(outputs, axis=1),outputs.shape[1], axis=0).reshape(outputs.shape)
        return outputs

更新:我发现了一件事,我在SoftMax算法的反向传播过程中弄糟了。实际的deltao应为:

            deltao = self.outputs - targets
            placeholder = np.zeros(np.shape(self.outputs))
            for j in range(np.shape(self.outputs)[1]):
                y = self.outputs[:, j]
                placeholder[:, j] = y * (1 - y)
# the counter for the for loop below used to also be named y causing confusion
                for i in range(np.shape(self.outputs)[1]):
                    if not i == j:
                        placeholder[:, j] += -y * self.outputs[:, i]
            deltao *= placeholder

在这一修正之后,溢出错误似乎已经自行解决了。然而,现在出现了一个新问题,无论我如何努力,无论我改变什么变量,感知器的精度都不会超过15%

第二个更新:经过很长一段时间,我终于找到了让代码正常工作的方法。我必须将SoftMax的反向传播(在代码中称为deltao)更改为以下内容:

   deltao = np.exp(self.outputs)
   deltao/=np.repeat(np.sum(deltao,axis=1),deltao.shape[1]).reshape(deltao.shape)
   deltao = deltao * (1 - deltao)
   deltao *= (self.outputs - targets)/np.shape(inputs)[0]

唯一的问题是我不知道为什么这是SoftMax的衍生产品,有人能解释一下吗


Tags: oftheselflayerfornptrainoutputs

热门问题