是否有任何解决方案可以将txt文件加载到Python中的LinearRegression()并计算predict by mean()的平均值?

2024-09-28 17:23:11 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在使用pandas从txt文件加载数据。有人能告诉我我的代码有什么问题吗

import sklearn.linear_model

wineQuality = pd.read_csv('winequality-all.txt', sep=",")

X = wineQuality.loc[:,("fixed.acidity","volatile.acidity","citric.acid","residual.sugar","chlorides","free.sulfur.dioxide","total.sulfur.dioxide","density","pH","sulphates","alcohol","color")]
y = wineQuality.loc[:,('response')]
X = X.drop(['color'], axis=1)
X = X.to_numpy();
y = y.to_numpy();
print(X)
print(y)
print(X.shape)
print(y.shape)
np.matmul(X,y);
mnk = sklearn.linear_model.LinearRegression().fit(X, y)
print('Score :',mnk.score(X,y))
print('Avg values :',mnk.predict(X.mean().reshape(1, -1)))

我的winequality-all.txt文件如下所示:

"fixed.acidity","volatile.acidity","citric.acid","residual.sugar","chlorides","free.sulfur.dioxide","total.sulfur.dioxide","density","pH","sulphates","alcohol","response","color"
7.4,0.7,0,1.9,0.076,11,34,0.9978,3.51,0.56,9.4,3,"red"
7.8,0.88,0,2.6,0.098,25,67,0.9968,3.2,0.68,9.8,3,"red"
7.8,0.76,0.04,2.3,0.092,15,54,0.997,3.26,0.65,9.8,3,"red"
...

我试图在我的X和y上使用像重塑(-1,1)或(1,-1)这样的方法,但对我不起作用

输出:

  • [7.40.70….3.51 0.56 9.4][7.80.88 0。。。 3.2 0.68 9.8][7.8 0.76 0.04…3.26 0.65 9.8][6.5 0.24 0.19…2.99 0.46 9.4][5.5 0.29 0.3…3.34 0.38 12.8][6.0.21 0.38…3.26 0.32 11.8]]
  • [3 3…4 5 4]
  • (5320,11)
  • (5320,)

我的错误:

ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 5320 is different from 11)

Tags: 文件txtmodelredsklearnallcolorlinear
1条回答
网友
1楼 · 发布于 2024-09-28 17:23:11

如果有numpy数组,则需要使用axis=0指定平均值,否则将采用整个数组的总平均值:

import sklearn.linear_model

X = np.random.normal(0,1,(20,5))
y = np.random.normal(0,1,20)

mnk = sklearn.linear_model.LinearRegression().fit(X, y)
print(mnk.predict(X.mean(axis=0).reshape(1,-1)))
print(np.matmul(mnk.coef_,X.mean(axis=0)) + mnk.intercept_)

否则,将其保留为数据帧:

df = pd.DataFrame(np.random.normal(0,1,(20,6)),columns=['y','x1','x2','x3','x4','x5'])
X = df[['x1','x2','x3','x4','x5']]
y = df['y']

mnk = sklearn.linear_model.LinearRegression().fit(X,y)
mnk.predict(pd.DataFrame(X.mean(axis=0)).T)

相关问题 更多 >