回答此问题可获得 20 贡献值,回答如果被采纳可获得 50 分。
<p>我试图训练一个模型,从csv中读取数据作为训练数据。为此,我尝试对分类特征进行一次热编码,然后将结果数组1和0作为特征传递给features,同时只传递普通的数字特征。在</p>
<p>我有以下代码:</p>
<pre><code>X = pd.read_csv('Data2Cut.csv')
Y = X.select_dtypes(include=[object])
le = preprocessing.LabelEncoder()
Y_2 = Y.apply(le.fit_transform)
enc = preprocessing.OneHotEncoder()
enc.fit(Y_2)
onehotlabels = enc.transform(Y_2).toarray()
onehotlabels.shape
features = []
labels = []
mycsv = csv.reader(open('Data2Cut.csv'))
indexCount = 0
for row in mycsv:
if indexCount < 8426:
features.<a href="https://www.cnpython.com/list/append" class="inner-link">append</a>([onehotlabels[indexCount], row[1], row[2], row[3], row[6], row[8], row[9], row[10], row[11]])
labels.append(row[12])
indexCount = indexCount + 1
training_data = np.array(features, dtype = 'float_')
training_labels = np.array(labels, dtype = 'float_')
log = linear_model.LogisticRegression()
log = log.fit(training_data, training_labels)
<a href="https://www.cnpython.com/pypi/joblib" class="inner-link">joblib</a>.dump(log, "modelLogisticRegression.pkl")
</code></pre>
<p>它似乎已经到了底线:</p>
^{pr2}$
<p>在它崩溃之前给出以下错误:</p>
<pre><code>ValueError: setting an array element with a sequence.
</code></pre>
<p>我想这是一个热编码值是数组而不是浮点的结果。如何更改/调整此代码以处理分类和数字特性作为训练数据?在</p>
<p>编辑:我正在输入的行的一个示例,其中每个列都是一个要素:</p>
<pre><code>mobile, 1498885897, 17491407, 23911, west coast, 2, seagull, 18, 41.0666666667, [0.325, 0.35], [u'text', u'font', u'writing', u'line'], 102, 5
#...
</code></pre>