值错误:无法将字符串转换为浮点。使用Pandas打开CSV文件

2024-06-24 12:01:15 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试打开一个CSV数据集进行决策树学习。当我运行代码时,结果是一个值错误。我认为问题在于逗号,但我不知道如何处理它

import pandas as pd
from sklearn.tree import DecisionTreeClassifier 
from sklearn.model_selection import train_test_split 
from sklearn import metrics 

col_names = ['pregnant', 'glucose', 'bp', 'skin', 'insulin', 'bmi', 'pedigree', 'age', 'label']

pima = pd.read_csv(r'D:\MachinLearning\MyDataSets_Implementations\pima-indians-diabetes.csv', header=None, names=col_names)

pima.head()

数据集的某些行如下所示:

    Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
6,148,72,35,0,33.6,0.627,50,1 1,85,66,29,0,26.6,0.351,31,0
8,183,64,0,0,23.3,0.672,32,1 1,89,66,23,94,28.1,0.167,21,0
0,137,40,35,168,43.1,2.288,33,1 5,116,74,0,0,25.6,0.201,30,0

Tags: csv数据代码fromimport决策树pandasnames
1条回答
网友
1楼 · 发布于 2024-06-24 12:01:15

我一点也没有出错。在pd.csv_read(... sep=",")中显式定义分隔符可能会有所帮助。您还应该在该函数中添加(..., skiprows=1),以避免将fileheader作为第一个数据行读取

import pandas as pd
col_names = ['pregnant', 'glucose', 'bp', 'skin', 'insulin', 'bmi', 'pedigree', 'age', 'label']

csv_path = r'D:\MachinLearning\MyDataSets_Implementations\pima-indians-diabetes.csv'
pima = pd.read_csv(csv_path, header=1, names=col_names, sep=",", skiprows=1)
print(pima.head())

给出输出


pregnant  glucose  bp  ...  pedigree  age  label 
6 148 72 35 0   33.6 0.627 50      1 1       85  66  ...     0.351   31      0
8 183 64 0  0   23.3 0.672 32      1 1       89  66  ...     0.167   21      0
0 137 40 35 168 43.1 2.288 33      1 5      116  74  ...     0.201   30      0

[3 rows x 9 columns]

相关问题 更多 >