用sklearn和pandas改进线性回归的POC

2024-09-24 00:32:06 发布

您现在位置：Python中文网/ 问答频道 /正文

6697

网友

男 | 程序猿一只，喜欢编程写python代码。

基本上，我在线性回归模型上部署了一个概念证明，以验证基于特定数据集的准确系数百分比。对于前一个构建我的模型的高层，我在数据集中应用了一种操作，以确保输入所需的所有列都是数值型的，并且是正常的。在

数据集概述显示，所有列都是数值型的，格式正确。 预测因素：

目标：

我运行了一个describe来获取更多细节并再次验证值。（红色预测器和黄色目标）

部署模型：

# split training and test
X_train, X_test,y_train,y_test = train_test_split (X,y,test_size=0.80,random_state = 33)

# Apply the scaler
scalerX = StandardScaler().fit(X_train)
scalery = StandardScaler().fit(y_train.reshape(-1,1))
X_train = scalerX.transform(X_train)
y_train = scalery.transform(y_train.reshape(-1,1))

# split the tragets in training/test
X_test = scalerX.transform(X_test)
y_test = scalery.transform(y_test.reshape(-1,1))

# Create model linear regression
clf_sgd = linear_model.SGDRegressor(loss='squared_loss',penalty=None,random_state=33)
#clf_sgd = LinearRegression()

# Learning based in the model
clf_sgd.fit(X_train,y_train.ravel())
print("Coefficient de determination:",clf_sgd.score(X_train,y_train))
# Model performance
y_pred = clf_sgd.predict(X_test)
print("Coefficient de determination:{0:.3f}".format(metrics.r2_score(y_test,y_pred)))

不幸的是，我的成绩非常糟糕，糟糕透顶。在

我期待着倾听和收集关于如何改进我的模型的想法，我在这个领域没有太多的经验。非常感谢。在

Tags： the 数据模型 test model 部署 transform train

1条回答

网友

1楼 · 发布于 2024-09-24 00:32:06

有两件事你可以改进：

1）需要正确配置线性模型的超参数。scikit学习SGDRegressor对几个参数的值选择非常敏感，它们是最重要的参数alpha、penalty、loss和{}。环顾四周，尝试学习一种称为交叉验证的技术，并根据数据确定这些参数的合理值。在

2）除了在非常特殊的情况下，您不需要真正缩放目标变量y

用sklearn和pandas改进线性回归的POC

相关问题更多 >

编程相关推荐

热门问题

热门文章

用sklearn和pandas改进线性回归的POC

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >