我有一个python数据帧(称为df),在控制台中打印时如下所示:
date 2019-09-03 00:00:00 ... OverallAtt
students ...
5c48943cbe8e95292564e163 0.0 ... 78.321678
5c48943dbe8e95292564e165 100.0 ... 87.500000
5c48943dbe8e95292564e166 100.0 ... 86.713287
5c48943dbe8e95292564e167 100.0 ... 95.804196
5c48943dbe8e95292564e169 100.0 ... 100.000000
5c48943dbe8e95292564e16b 100.0 ... 98.601399
5c48943dbe8e95292564e16d 100.0 ... 85.314685
5c48943dbe8e95292564e173 100.0 ... 96.503497
5c48943dbe8e95292564e175 100.0 ... 83.216783
但是,当我尝试选择students列并将其放入单独的变量中时,如下所示:
Names = df['students']
它会出现以下错误:
KeyError: 'students'
有人知道为什么不行吗
更新
这是固定的,但现在我得到另一个错误,当我试图打印预测值。这是我的密码:
dataset = df
X = dataset
X = X.drop(['OverallAtt'], axis=1)
X = pd.DataFrame(X).fillna(0)
y = dataset['OverallAtt'] #Total Attendance ThisYear
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)
y_pred = regressor.predict(X_test)
import pickle
filename='Regressor_model.sav'
pickle.dump(regressor, open(filename, 'wb'))
load_lr_model =pickle.load(open(filename, 'rb'))
#PREDICT FROM NEW DATA
dataset = df
X = dataset
X = X.drop(['OverallAtt'], axis=1)
X = pd.DataFrame(X).fillna(0)
ActualAttendance = dataset['OverallAtt']
Names = df.reset_index(drop=False)['students']
NewX_test = (X)
y_load_predit=load_lr_model.predict(NewX_test)
Newdf = pd.DataFrame({'Full Name': Names, 'Actual Attendance': ActualAttendance, 'Predicted Attendance': y_load_predit})
print(Newdf)
我得到这个错误:
ValueError: array length 77 does not match index length 459
actualtendance和Names都是382。Y\u load\u predit也是382的数组。所以不知道我为什么会犯这个错误
看起来
students
是您的索引名。为了获得它,您可以重置索引:相关问题 更多 >
编程相关推荐