在python 3.6中,我想使用sklearn tree进行分类,但出现ValueError:无法将字符串转换为float:'NC',

2024-10-01 04:59:54 发布

您现在位置:Python中文网/ 问答频道 /正文

这是我的密码

import os
import pandas as pd
import numpy as np
import pylab as pl
from sklearn import tree
os.chdir('C:/Users/Shinelon/Desktop/ch13')
w=pd.read_table('cup98lrn.txt',sep=',',low_memory=False)
w1=(w.loc[:,['AGE','AVGGIFT','CARDGIFT','CARDPM12','CARDPROM','CLUSTER2','DOMAIN','GENDER','GEOCODE2','HIT',
           'HOMEOWNR','HPHONE_D','INCOME','LASTGIFT','MAXRAMNT',
           'MDMAUD_F','MDMAUD_R','MINRAMNT','NGIFTALL','NUMPRM12',
           'RAMNTALL',
           'RFA_2A','RFA_2F','STATE','TIMELAG','TARGET_B']]).dropna(how='any')
x=w1.loc[:,['AGE','AVGGIFT','CARDGIFT','CARDPM12','CARDPROM','CLUSTER2','DOMAIN','GENDER','GEOCODE2','HIT',
           'HOMEOWNR','HPHONE_D','INCOME','LASTGIFT','MAXRAMNT',
           'MDMAUD_F','MDMAUD_R','MINRAMNT','NGIFTALL','NUMPRM12',
           'RAMNTALL',
           'RFA_2A','RFA_2F','STATE','TIMELAG']]
y=w1.loc[:,['TARGET_B']]
clf=tree.DecisionTreeClassifier(min_samples_split=1000,min_samples_leaf=400,max_depth=10)
print(w1.head())
clf=clf.fit(x,y)

但是出现了我不明白的问题。因为我用sklearn.树在.D:\Python3.6之前\python.exeC:/Users/Shinelon/Desktop/ch13/.idea/13.4.py

     AGE    AVGGIFT  CARDGIFT  CARDPM12  CARDPROM  CLUSTER2 DOMAIN GENDER  \
1   46.0  15.666667         1         6        12       1.0     S1      M   
3   70.0   6.812500         7         6        27      41.0     R2      F   
4   78.0   6.864865         8        10        43      26.0     S2      F   
6   38.0   7.642857         8         4        26      53.0     T2      F   
11  75.0  12.500000         2         6         8      23.0     S2      M   

   GEOCODE2  HIT    ...    MDMAUD_R  MINRAMNT  NGIFTALL  NUMPRM12  RAMNTALL  \
1         A   16    ...           X      10.0         3        13      47.0   
3         C    2    ...           X       2.0        16        14     109.0   
4         A   60    ...           X       3.0        37        25     254.0   
6         D    0    ...           X       3.0        14         9     107.0   
11        B    3    ...           X      10.0         2        12      25.0   

   RFA_2A RFA_2F  STATE  TIMELAG  TARGET_B  
1       G      2     CA     18.0         0  
3       E      4     CA      9.0         0  
4       F      2     FL     14.0         0  
6       E      1     IN      4.0         0  
11      F      2     IN      3.0         0 

这是打印(w1)结果


Tags: importagedomainasgenderlocw1rfa