一维阵列上的支持向量机

2024-10-06 12:05:02 发布

您现在位置:Python中文网/ 问答频道 /正文

我在摆弄泰坦尼克号的数据集。我尝试使用以下代码将SVM应用于许多单独的功能:

quanti_vars = ['Age','Pclass','Fare','Parch']

imp_med = SimpleImputer(missing_values=np.nan, strategy='median')
imp_med.fit(titanic[['Age']])

for i in (X_train, X_test):
    i[['Age']] = imp_med.transform(i[['Age']])

svm_clf = SVC()
svm_clf.fit(X_train[quanti_vars], y_train)
y_pred = svm_clf.predict(X_test[quanti_vars])
svm_accuracy = accuracy_score(y_pred, y_test)
svm_accuracy

for i in quanti_vars:
    svm_clf.fit(X_train[i], y_train)
    y_pred = svm_clf.predict(X_test[i])
    svm_accuracy = accuracy_score(y_pred, y_test)
    print(i,': ',svm_accuracy)

最后的for循环抛出了一个ValueError: Expected 2D array, got 1D array instead错误,我不知道为什么——SVM不能对单个特征进行操作吗


Tags: intestforagetrainmedvarsfit
2条回答

我意识到,非常简单,我需要将i放在两个括号中以正确地表示子集。因此:

for i in quanti_vars:
    svm_clf.fit(X_train[[i]], y_train)
    y_pred = svm_clf.predict(X_test[[i]])
    svm_accuracy = accuracy_score(y_pred, y_test)
    print(i,': ',svm_accuracy)

产生

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto_deprecated',
  kernel='rbf', max_iter=-1, probability=False, random_state=None,
  shrinking=True, tol=0.001, verbose=False)
Age :  0.5874125874125874
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto_deprecated',
  kernel='rbf', max_iter=-1, probability=False, random_state=None,
  shrinking=True, tol=0.001, verbose=False)
Pclass :  0.5874125874125874
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto_deprecated',
  kernel='rbf', max_iter=-1, probability=False, random_state=None,
  shrinking=True, tol=0.001, verbose=False)
Fare :  0.42657342657342656
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto_deprecated',
  kernel='rbf', max_iter=-1, probability=False, random_state=None,
  shrinking=True, tol=0.001, verbose=False)
Parch :  0.6153846153846154

(我不会假装它很好,但至少它起作用了。)

那很容易 只需写下以下内容:

y_pred = svm_clf.predict([X_test[i]])

添加[]将其转换为二维数组 而且最好是把票价带到10级,而不是直接使用 bcs 30美元和50美元之间有巨大的差异,但随着价格的上涨,差异会逐渐消失 例如,300美元和500美元之间没有太大差别

相关问题 更多 >