如何修复python中的Valueerror,机器学习中的scaling?

2024-10-02 02:26:46 发布

您现在位置:Python中文网/ 问答频道 /正文

我在学习机器学习。在学习KMN算法时,需要对数据进行缩放。当我应用它时,它给了我一个值错误

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv(r'C:\Users\admin\Documents\milemarkers.csv')
df.head()
->
OBJECTID    REF_PT_ID   HWY     REF_PT_NUM  ROUTE_ID_RIMS
0   10060   52.000  14  52.000000   192
1   10061   54.167  29  54.167000   14
2   10062   122.000     94  122.000000  15
3   10063   0.000   48  0.000000    229
4   10064   196.014     29  196.014008  14

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(df)
scaled_features = scaler.transform(df)
df_feat = pd.DataFrame(scaled_features,columns = df.columns[:-1])





ValueError: Shape of passed values is (9738, 5), indices imply (9738, 4)

Tags: columnscsvimport机器refptiddf
2条回答

不就是因为你忽略了最后一列吗?很明显,使用[:-1]你就成功了(9738,4) 正确的形式是

df_feat = pd.DataFrame(scaled_features,columns = df.columns)

我不认为你应该缩放OBJECTID列

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()

选择特征并缩放它们

features = df.iloc[:,1:]
scaler.fit(features)
scaled_features = scaler.transform(features)
df_scaled = pd.DataFrame(data=scaled_features,columns=df.columns[1:])

>>> scaler.mean_
array([84.8362   , 42.8      , 84.8362016, 92.8      ])
>>> scaler.scale_    #std
array([67.76082634, 27.78056875, 67.76082897, 96.81198273])

添加原始数据帧中未缩放的列

missing = df.columns.difference(df_scaled.columns)
df_scaled[missing] = df[missing]
df_scaled = df_scaled.reindex(columns=df.columns)    #re-order the columns

>>> df_scaled
   OBJECTID  REF_PT_ID       HWY  REF_PT_NUM  ROUTE_ID_RIMS
0     10060  -0.484590 -1.036696   -0.484590       1.024667
1     10061  -0.452610 -0.496750   -0.452610      -0.813949
2     10062   0.548456  1.843015    0.548455      -0.803620
3     10063  -1.251995  0.187181   -1.251995       1.406851
4     10064   1.640739 -0.496750    1.640739      -0.813949

数据

data = '''{"OBJECTID":{"0":10060,"1":10061,"2":10062,"3":10063,"4":10064},
           "REF_PT_ID":{"0":52.0,"1":54.167,"2":122.0,"3":0.0,"4":196.014},
           "HWY":{"0":14,"1":29,"2":94,"3":48,"4":29},
           "REF_PT_NUM":{"0":52.0,"1":54.167,"2":122.0,"3":0.0,"4":196.014008},
           "ROUTE_ID_RIMS":{"0":192,"1":14,"2":15,"3":229,"4":14}}'''
df = pd.read_json(data)

相关问题 更多 >

    热门问题