Keras神经网络回归模型损失小,准确度为0

2024-06-02 13:20:14 发布

您现在位置:Python中文网/ 问答频道 /正文

我对keras中的NN回归模型有问题。我正在研究一个汽车数据集,根据13个维度预测价格。简言之,我将其读作pandas dataframe,将数值转换为float,缩放值,然后对分类值使用一种热编码,这创建了许多新列,但我现在不太关心这一点。我担心的是准确率实际上是0%,我不知道为什么。可以在此处找到数据集:https://www.kaggle.com/CooperUnion/cardataset/data。代码如下:

import numpy
import pandas
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasRegressor
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from keras.utils import to_categorical

# load dataset
# Columns : Make, Model, Year, Engine Fuel Type, Engine HP, Engine Cylinders, Transmission Type, Driven_Wheels, Number of Doors, Vehicle Size, Vehicle Style, highway MPG, city mpg, Popularity, MSRP

import pandas as pd
dataframe = pd.read_csv("cars.csv", header = 'infer', names=['Make', 'Model', 'Year', 'Engine Fuel Type', 'Engine HP', 'Engine Cylinders', 'Transmission Type', 'Driven_Wheels', 'Number of Doors', 'Vehicle Size', 'Vehicle Style', 'highway MPG', 'city mpg', 'Popularity', 'MSRP'])

#convert data columns to float
dataframe[['Engine HP', 'highway MPG', 'city mpg', 'Popularity', 'MSRP']] = dataframe[['Engine HP', 'highway MPG', 'city mpg', 'Popularity', 'MSRP']].apply(pd.to_numeric)

#normalize the values - divide my their max value
dataframe["Engine HP"] = dataframe["Engine HP"] / dataframe["Engine HP"].max()

dataframe["highway MPG"] = dataframe["highway MPG"] / dataframe["highway MPG"].max()

dataframe["city mpg"] = dataframe["city mpg"] / dataframe["city mpg"].max()

dataframe["Popularity"] = dataframe["Popularity"] / dataframe["Popularity"].max()

dataframe["MSRP"] = dataframe["MSRP"] / dataframe["MSRP"].max()

#split input and label
x = dataframe.iloc[:,0:14] 
y = dataframe.iloc[:,14] 

#one-hot encoding for categorical values

def one_hot(df, cols):
    for each in cols:
        dummies = pd.get_dummies(df[each], prefix=each, drop_first=False)
        df = pd.concat([df, dummies], axis=1)
    return df

#columns to transform
cols_to_tran = ['Make', 'Model', 'Year', 'Engine Fuel Type', 'Engine Cylinders', 'Transmission Type', 'Driven_Wheels', 'Number of Doors', 'Vehicle Size', 'Vehicle Style']
d = one_hot(x, cols_to_tran)

list(d.columns.values)

#drop first original 11 columns
e = d.drop(d.columns[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]], axis=1)
list(e.columns.values)

#create train and test datasets - 80% for train and 20% for validation
t = len(e)*0.8
t = int(t)

train_data = e[0:t]
train_targets = y[0:t]

test_data = e[t:]
test_targets = y[t:]


#convert to numpy array
train_data = train_data.values
train_targets = train_targets.values

test_data = test_data.values
test_targets = test_targets.values


# Sample Multilayer Perceptron Neural Network in Keras
from keras.models import Sequential
from keras.layers import Dense
import numpy

model = Sequential()
model.add(Dense(64, activation='relu', input_shape=(train_data.shape[1],)))
model.add(Dense(32, activation='relu'))
#model.add(Dense(1, activation='sigmoid'))
model.add(Dense(1))

# 2. compile the network
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

# 3. fit the network
history = model.fit(train_data, train_targets, epochs=100, batch_size=50)

# 4. evaluate the network
loss, accuracy = model.evaluate(test_data, test_targets)
print("\nLoss: %.2f, Accuracy: %.2f%%" % (loss, accuracy*100))

# 5. make predictions
probabilities = model.predict(test_data)
predictions = [float(x) for x in probabilities]
accuracy = numpy.mean(predictions == test_targets)
print("Prediction Accuracy: %.2f%%" % (accuracy*100))

结果如下:

enter image description here

任何帮助都将不胜感激。在


Tags: tofromtestimportdataframedatamodeltrain
2条回答

准确度是一种分类指标,用它进行回归是没有意义的。实际上没有问题。在

首先,在stackoverflow中发布问题时,应该考虑清理代码。在清理numpy数组train_datatrain_targetstest_data和{}之前,我尝试复制您的代码并发现了一些错误。在

专注于机器学习理论,如果你不改变你的数据集,你的回归模型将很难得到训练。在拆分训练和测试子集之前,请尝试使用random.shuffle()来洗牌数据集。在

Matias answer所述,如果您正在处理回归问题(而不是分类问题),那么使用精度度量是没有意义的。在

此外,二元交叉熵损失也只适用于分类,因此它没有任何意义。用于回归模型的典型损失是均方误差。考虑通过以下方式更改keras模型编译:

model.compile(loss='mean_squared_error', optimizer='adam')

希望这有帮助!在

相关问题 更多 >