pyspark拟合方法中MLP分类器的误差分析

2024-10-03 09:06:55 发布

您现在位置:Python中文网/ 问答频道 /正文

我使用的是来自pyspark.ml.分类 我的数据集有11个特征

['fixed acidity',
 'volatile acidity',
 'citric acid',
 'residual sugar',
 'chlorides',
 'free sulfur dioxide',
 'total sulfur dioxide',
 'density',
 'pH',
 'sulphates',
 'alcohol']

我的标签由7个类组成。你知道吗

-----+
|label|
+-----+
|    6|
|    3|
|    5|
|    9|
|    4|
|    8|
|    7|
+-----+

我在pyspark中使用多层感知器分类器模型来训练我的数据集。根据pyspark-ML约定,我用这种格式指定了我的神经网络结构

# specify layers for the neural network:
# input layer of size 11 (features), two intermediate of size 5 and 4
# and output of size 7 (classes)
layers = [11,5,4,7]

我正在指定我的分类器

clf = MultilayerPerceptronClassifier(labelCol='label',layers=layers)

现在,我正在用我的火车数据训练

cvModel = clf.fit(train_data)

有人能告诉我为什么我会犯这个错误吗?你知道吗

错误

Py4JJavaError: An error occurred while calling o241.fit.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 47.0 failed 1 times, most recent failure: Lost task 0.0 in stage 47.0 (TID 812, localhost, executor driver): java.lang.ArrayIndexOutOfBoundsException: 8
    at org.apache.spark.ml.classification.LabelConverter$.encodeLabeledPoint(MultilayerPerceptronClassifier.scala:121)
    at org.apache.spark.ml.classification.MultilayerPerceptronClassifier$$anonfun$3.apply(MultilayerPerceptronClassifier.scala:238)
    at org.apache.spark.ml.classification.MultilayerPerceptronClassifier$$anonfun$3.apply(MultilayerPerceptronClassifier.scala:238)

Tags: of数据orgsizeapachelayersstageml