值错误:数据类型必须提供itemsize?

2024-05-21 05:32:11 发布

您现在位置:Python中文网/ 问答频道 /正文

我的代码如下,每次我运行它时,它都有一个错误

“值错误:数据类型必须提供itemsize”

我找不到它不起作用的原因。

我不知道为什么?

from sklearn.linear_model import LogisticRegression
trainX = [('2', '0.455', '0.365', '0.095', '0.514', '0.2245', '0.101', '0.15'), ('2', '0.35', '0.265', '0.09', '0.2255', '0.0995', '0.0485', '0.07'), ('1', '0.53', '0.42', '0.135', '0.677', '0.2565', '0.1415', '0.21'), ('2', '0.44', '0.365', '0.125', '0.516', '0.2155', '0.114', '0.155'), ('3', '0.33', '0.255', '0.08', '0.205', '0.0895', '0.0395', '0.055')]
trainY = ['15', '7', '9', '10', '7']
testX = [('3', '0.475', '0.36', '0.11', '0.452', '0.191', '0.099', '0.13'), ('3', '0.485', '0.37', '0.14', '0.5065', '0.2425', '0.088', '0.1465')]
model = LogisticRegression()
model.fit(trainX,trainY)
predict = model.predict(testX[0:2])#error
print predict

Tags: 代码fromimportmodel错误trainy原因sklearn
2条回答

由于LogisticRegression需要数字数据,请首先使用numpy将数据转换为float,然后使用LogisticRegression,如下所示:

>>> from sklearn.linear_model import LogisticRegression
>>> import numpy as np
>>> trainX = [('2', '0.455', '0.365', '0.095', '0.514', '0.2245', '0.101', '0.15'), ('2', '0.35', '0.265', '0.09', '0.2255', '0.0995', '0.0485', '0.07'), ('1', '0.53', '0.42', '0.135', '0.677', '0.2565', '0.1415', '0.21'), ('2', '0.44', '0.365', '0.125', '0.516', '0.2155', '0.114', '0.155'), ('3', '0.33', '0.255', '0.08', '0.205', '0.0895', '0.0395', '0.055')]
>>> trainY = ['15', '7', '9', '10', '7']
>>> testX = [('3', '0.475', '0.36', '0.11', '0.452', '0.191', '0.099', '0.13'), ('3', '0.485', '0.37', '0.14', '0.5065', '0.2425', '0.088', '0.1465')]
model = LogisticRegression()
>>> trainX=np.array(trainX,dtype=float)
>>> trainY=np.array(trainY,dtype=float)
>>> testX=np.array(testX,dtype=float)
>>> model.fit(trainX,trainY)
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, penalty='l2', random_state=None, tol=0.0001)
>>> predict = model.predict(testX[0:2])
>>> predict
array([ 7.,  7.])

问题是数据中有字符串而不是数字。把数据改成:

# note the stripped 's
trainX = [(2, 0.455, 0.365, 0.095, 0.514, 0.2245, 0.101, 0.15), ...] 
trainY = [15, 7, 9, 10, 7]
testX = [(3, 0.475, 0.36, 0.11, 0.452, 0.191, 0.099, 0.13),  ...]

你可能想读一点关于data types in Python的东西。

如果由于某种原因,您有这样的数据,并且不想手动重写它,您可以使用以下转换函数:

def destringifyTupleData(d):
    return [tuple(destringifyList(l)) for l in trainX]

def destringifyList(l):
    return map(float, l)

# ...

trainX = destringifyTupleData(trainX)
trainY = destringifyList(trainY)
testX = destringifyTupleData(testX)

相关问题 更多 >