如何规范化数据框中的列,然后绘制回归线?

2024-09-30 01:24:59 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个包含许多列的大数据框。我想规范化几个列,它们都是数字,然后使用回归绘制两个。我以为下面的代码可以帮我

from sklearn import preprocessing
# Create x, where x the 'scores' column's values as floats
modDF = df[['WeightedAvg','Score','Co','Score', 'PeerGroup', 'TimeT', 'Ter', 'Spread']].values.astype(float)
# Create a minimum and maximum processor object
min_max_scaler = preprocessing.MinMaxScaler()
# Create an object to transform the data to fit minmax processor
x_scaled = min_max_scaler.fit_transform(modDF)
# Run the normalizer on the dataframe
df_normalized = pd.DataFrame(x_scaled)


import seaborn as sns
import matplotlib.pyplot as plt
sns.regplot(x="WeightedAvg", y="Spread", data=modDF)

但是,我得到以下错误:IndexError: only integers, slices (:), ellipsis (), numpy.newaxis () and integer or boolean arrays are valid indices

我使用sns.regplot进行了一次没有规范化的回归,结果成功了,但它看起来很奇怪,所以我想看看是否应用了规范化。我知道回归是如何运作的。我只是不知道回归是如何运作的


Tags: andtheimportdfascreate规范化processor
1条回答
网友
1楼 · 发布于 2024-09-30 01:24:59

不需要使用命令:df_normalized = pd.DataFrame(x_scaled)

如果要运行linear regression。这应该起作用:

from sklearn import preprocessing
from sklearn.linear_model import LinearRegression

df = ['WeightedAvg','Score','Co','Score', 'PeerGroup', 'TimeT', 'Ter', 'Spread']
df[cols] = df[cols].apply(pd.to_numeric, errors='coerce', axis=1)

X = df[['WeightedAvg','Score','Co','Score', 'PeerGroup', 'TimeT', 'Ter', 'Spread']]
#select your target variable
y = df[['target']]
#train test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

# Create a minimum and maximum processor object
min_max_scaler = preprocessing.MinMaxScaler()
# Create an object to transform the data to fit minmax processor
X_train_scaled = min_max_scaler.fit_transform(X_train)
X_test_scaled = min_max_scaler.transform(X_test)
#start linear regression
reg = LinearRegression().fit(X_train_scaled, y_train)
#predict for test
y_predict = reg(X_test_scaled, y_test)

如果使用“训练/测试分割”,则必须仅在训练数据上使用缩放器拟合,测试数据在该时间点是未知的!对于测试部件,您只能将其用于转换

相关问题 更多 >

    热门问题