神经网络在多标签分类任务中的不收敛性

2024-10-01 17:25:33 发布

男 | 程序猿一只，喜欢编程写python代码。

我是TensorFlow的初学者，我真的需要一些帮助来完成这个任务。我正在进行一些图像像素分类，我的问题是这样设置的：

我的输入是包含20个值的数组X。这些值表示4个像素（每个像素有5个值）。我的输出是一个由4个值组成的数组，其中每个值可以是1或0，这意味着特定像素可能有也可能没有特定的特征。如您所想，我的y的形式可能是y=[1, 0, 0, 1]，因此每个实例可以有多个类。在

为了完成这个分类任务，我组织了一个神经网络，输入层有20个神经元，一个隐藏层是15个，然后是另一个隐藏层10个，最终有一个4个输出层。隐藏层的激活函数是ReLU，我对它们应用了50%的丢失。作为最小化的损失函数，我使用tensorflow的sigmoid_cross_entropy_with_logits，因为它计算每个类的独立概率，允许执行多标签分类。在

当我第一次尝试训练网络时，我得到了所有NaN的结果，因为（我认为）一个爆炸性的梯度问题。这显然在我降低学习率后解决了。在

我现在的问题是这个网络根本没有融合，我相信这是因为我使用的成本和激活函数出了问题。在

注意：输入已经用sklearn.preprocessing.StandardScaler缩放

代码如下：

import tensorflow as tf

n_inputs = 20
n_hidden1 = 15
n_hidden2 = 10
n_outputs = 4

dropout_rate = 0.5
learning_rate = 0.000000001


#This is a boolean value that indicates when to apply dropout
training = tf.placeholder_with_default(True, shape=(), name="training")

X = tf.placeholder(tf.float64, shape=(None, n_inputs), name ="X")
y = tf.placeholder(tf.int64, shape=(None, 4), name = "y")

with tf.name_scope("dnn"):
    hidden1 = tf.layers.dense(X, n_hidden1, name="hidden1", activation = tf.nn.relu)
    hidden1_drop = tf.layers.dropout(hidden1, dropout_rate, training = training)

    hidden2 = tf.layers.dense(hidden1, n_hidden2, name="hidden2", activation=tf.nn.relu)
    hidden2_drop = tf.layers.dropout(hidden2, dropout_rate, training = training)

    logits = tf.layers.dense(hidden2_drop, n_outputs, name="outputs")


with tf.name_scope("loss"):
    xentropy = tf.nn.sigmoid_cross_entropy_with_logits(labels = tf.cast(y, tf.float64), logits = tf.cast(logits, tf.float64))
    loss = tf.reduce_mean(xentropy, name = 'loss')


with tf.name_scope("train"):
    optimizer = tf.train.GradientDescentOptimizer(learning_rate)
    training_op = optimizer.minimize(loss)

init = tf.global_variables_initializer()
saver = tf.train.Saver()

n_epochs = 50
batch_size = 50
n_batches = 1000000

#training
with tf.Session() as sess:
    init.run()
    for epoch in range(n_epochs): 
        for i in range(n_batches):
            X_batch = np.asarray(X_batches[i]).reshape(-1, 20)
            y_batch = np.asarray(y_batches[i]).reshape(-1, 4)
            sess.run(training_op, feed_dict ={X: X_batch, y: y_batch, training:True})
            if (i % 10000) == 0:
                raws = logits.eval(feed_dict={X: X_batch, training:False})
                print("epoca = "+str(epoch))
                print("iterazione = "+str(i))
                print("accuratezza = "+str(get_global_accuracy_rate(raws, y_batch)))
                print("X = "+str(X_batch[0])+ " y = "+str(y_batch[0]))
                print("raws = "+str(raws[0])+" pred = " + str(get_prediction(raws[0])))


    save_path= saver.save(sess, "./my_model_final_1.ckpt")

非常感谢您的帮助！！在

Tags： name rate layers tf with batch training 像素

0条回答

目前没有回答

神经网络在多标签分类任务中的不收敛性

相关问题更多 >

编程相关推荐

热门问题

热门文章

神经网络在多标签分类任务中的不收敛性

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >