超高成本十

2024-06-26 18:07:20 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图用Tensorflow对kaggle数据集进行价格预测。 我的神经网络正在学习,但是,我的成本函数非常高,我的预测与实际输出相差甚远。 我试图通过添加或删除一些层、神经元和激活功能来改变我的网络。 我尝试了很多超参数,但这并没有改变太多。 我不认为问题来自我的数据,我查了一下kaggle,这是大多数人使用的数据。在

如果你知道为什么我的成本这么高,如何降低它,如果你能解释给我,这将是非常伟大的!在

她是我的准则:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import tensorflow as tf
from sklearn.utils import shuffle

df = pd.read_csv(r"C:\Users\User\Documents\TENSORFLOW\Prediction prix\train2.csv", sep=';')
df.head()

df = df.loc[:, ['OverallQual', 'GrLivArea', 'GarageCars', 'TotalBsmtSF', 'FullBath', 'SalePrice']]

df = df.replace(np.nan, 0)

df

%matplotlib inline
plt = sns.pairplot(df)
plt

df = shuffle(df)

df_train = df[0:1000]
df_test = df[1001:1451]

inputX = df_train.drop('SalePrice', 1).as_matrix()
inputX = inputX.astype(int)

inputY = df_train.loc[:, ['SalePrice']].as_matrix()
inputY = inputY.astype(int)

inputX_test = df_test.drop('SalePrice', 1).as_matrix()
inputX_test = inputX_test.astype(int)

inputY_test = df_test.loc[:, ['SalePrice']].as_matrix()
inputY_test = inputY_test.astype(int)



# Parameters
learning_rate = 0.01
training_epochs = 1000
batch_size = 500
display_step = 50

n_samples = inputX.shape[0]


x = tf.placeholder(tf.float32, [None, 5])
y = tf.placeholder(tf.float32, [None, 1])


def add_layer(inputs, in_size, out_size, activation_function=None):
    Weights = tf.Variable(tf.random_normal([in_size, out_size], stddev=0.1))
    biases = tf.Variable(tf.zeros([1, out_size]) + 0.1)
    Wx_plus_b = tf.matmul(inputs, Weights) + biases
    if activation_function is None:
        output = Wx_plus_b
    else:
        output = activation_function(Wx_plus_b)
    return output


l1 = add_layer(x, 5, 3, activation_function=tf.nn.relu)

pred = add_layer(l1, 3, 1)


# Mean squared error
cost = tf.reduce_sum(tf.pow(pred-y, 2))/(2*n_samples)
# Gradient descent
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)


# Initializing the variables
init = tf.global_variables_initializer()


# Launch the graph
with tf.Session() as sess:
    sess.run(init)

    # Training cycle
    for epoch in range(training_epochs):
        avg_cost = 0.
        total_batch = batch_size
        # Loop over all batches
        for i in range(total_batch):
            # Run optimization op (backprop) and cost op (to get loss value)
            _, c = sess.run([optimizer, cost], feed_dict={x: inputX,
                                                          y: inputY})
            # Compute average loss
            avg_cost += c / total_batch
        # Display logs per epoch step
        if epoch % display_step == 0:
            print("Epoch:", '%04d' % (epoch+1), "cost=", \
                "{:.9f}".format(avg_cost))
    print("Optimization Finished!")

    # Test model
    correct_prediction = tf.equal(pred,y)
    # Calculate accuracy
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
    print("Accuracy:", accuracy.eval({x: inputX, y: inputY}))
    print(sess.run(pred, feed_dict={x: inputX_test}))

时代:0001成本=10142407502702304395526144.000000000

时代:0051成本=3256106752.000019550

时代:0101成本=3256106752.000019550

历元:0151成本=3256106752.000019550

时代:0201成本=3256106752.000019550

。。。在

谢谢你的帮助!在


Tags: testimportdfsizetfasbatchtrain
2条回答

我已经有过类似的问题,在经过几步培训后,成本很高,然后成本保持不变。对我来说,这是一种溢出,梯度太大,在训练的早期就创造了Nan值。我解决这个问题的方法是从较小的学习速率开始(可能要小得多),直到成本和梯度变得更合理(几十步),然后回到常规的学习率(开始时更高,可能会衰退)。在

请参阅my answer to this post中的一个类似案例,该案例仅通过在开始时采用较小的学习速率来解决。在

也可以使用tf.clip_by_value剪裁渐变以避免此问题。它为渐变设置一个最小值和最大值,这样可以避免在最初几次迭代后将权重直接发送到Nan的巨大渐变。要使用它(最小值和最大值为-1和1,可能太紧),请更换

optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)

通过

^{pr2}$

我在实施过程中发现了几个问题:

  1. 输入没有缩放。
    使用sklearn StandardScaler来缩放输入inputX、inputY(以及inputX_text和inputY_text),使其成为零均值和单位方差。您可以使用反变换再次将输出转换回适当的比例。在

    sc = StandardScaler().fit(inputX)
    inputX = sc.transform(inputX)
    inputX_test = sc.transform(inputX_test)
    
  2. 批处理大小太大,您将整个集作为单个批传递。这不应该导致您所面临的特定问题,但为了更好地收敛,请尝试减小批处理大小。实现get_batch()生成器函数并执行以下操作:

    for batch_X, batch_Y in get_batch(input_X, input_Y, batch_size):
       _, c = sess.run([optimizer, cost], feed_dict={x: batch_X,
                                                  y: batch_Y})
    
  3. 如果仍然看到问题,请尝试较小的权重初始化(stddev)。在

工作代码如下:

inputX = df_train.drop('SalePrice', 1).as_matrix()
inputX = inputX.astype(int)
sc = StandardScaler().fit(inputX)
inputX = sc.transform(inputX)

inputY = df_train.loc[:, ['SalePrice']].as_matrix()
inputY = inputY.astype(int)
sc1 = StandardScaler().fit(inputY)
inputY = sc1.transform(inputY)

inputX_test = df_test.drop('SalePrice', 1).as_matrix()
inputX_test = inputX_test.astype(int)
inputX_test = sc.transform(inputX_test)

inputY_test = df_test.loc[:, ['SalePrice']].as_matrix()
inputY_test = inputY_test.astype(int)
inputY_test = sc1.transform(inputY_test)

learning_rate = 0.01
training_epochs = 1000
batch_size = 50
display_step = 50

n_samples = inputX.shape[0]

x = tf.placeholder(tf.float32, [None, 5])
y = tf.placeholder(tf.float32, [None, 1])

def get_batch(inputX, inputY, batch_size):
  duration = len(inputX)
  for i in range(0,duration//batch_size):
    idx = i*batch_size
    yield inputX[idx:idx+batch_size], inputY[idx:idx+batch_size]


def add_layer(inputs, in_size, out_size, activation_function=None):
  Weights = tf.Variable(tf.random_normal([in_size, out_size], stddev=0.005))
  biases = tf.Variable(tf.zeros([1, out_size]))
  Wx_plus_b = tf.matmul(inputs, Weights) + biases
  if activation_function is None:
    output = Wx_plus_b
  else:
    output = activation_function(Wx_plus_b)
  return output


l1 = add_layer(x, 5, 3, activation_function=tf.nn.relu)

pred = add_layer(l1, 3, 1)

# Mean squared error
cost = tf.reduce_mean(tf.pow(tf.subtract(pred, y), 2))
# Gradient descent
optimizer =   tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)


# Initializing the variables
init = tf.global_variables_initializer()


# Launch the graph
with tf.Session() as sess:
 sess.run(init)

 # Training cycle
 for epoch in range(training_epochs):
    avg_cost = 0.
    total_batch = batch_size
    # Loop over all batches
    #for i in range(total_batch):
    for batch_x, batch_y in get_batch(inputX, inputY, batch_size):
        # Run optimization op (backprop) and cost op (to get loss value)
        _, c, _l1, _pred = sess.run([optimizer, cost, l1, pred], feed_dict={x: batch_x, y: batch_y})
        # Compute average loss
        avg_cost += c / total_batch
    # Display logs per epoch step
    if epoch % display_step == 0:
        print("Epoch:", '%04d' % (epoch+1), "cost=", "{:.9f} ".format(avg_cost))
        #print(_l1, _pred)
print("Optimization Finished!")

相关问题 更多 >