错误“AttributeError:'Py4JError'对象没有属性'message'正在生成DecisionTreeMod

import org.apache.spark.mllib.linalg._ import org.apache.spark.mllib.regression._ val rawData = sc.textFile("hdfs:///user/ds/covtype.data") val data = rawData.map { line => val values = line.split(',').map(_.toDouble) val featureVector = Vectors.dense(values.init) val label = values.last - 1 LabeledPoint(label, featureVector) } val Array(trainData, cvData, testData) = data.randomSplit(Array(0.8, 0.1, 0.1)) trainData.cache() cvData.cache() testData.cache() import org.apache.spark.mllib.evaluation._ import org.apache.spark.mllib.tree._ import org.apache.spark.mllib.tree.model._ import org.apache.spark.rdd._ def getMetrics(model: DecisionTreeModel, data: RDD[LabeledPoint]): MulticlassMetrics = { val predictionsAndLabels = data.map(example => (model.predict(example.features), example.label) ) new MulticlassMetrics(predictionsAndLabels) } val model = DecisionTree.trainClassifier( trainData, 7, Map[Int,Int](), "gini", 4, 100) val metrics = getMetrics(model, cvData) metrics.confusionMatrix

1条回答

网友

1楼 · 发布于 2024-09-29 21:45:45

我认为问题出在model.predict函数中

来自pyspark mllib/tree.py

Note: In Python, predict cannot currently be used within an RDD transformation or action. Call predict directly on the RDD instead.

你能做的就是像这样直接传递特征向量

>>> rdd = sc.parallelize([[1.0], [0.0]])
>>> model.predict(rdd).collect()
[1.0, 0.0]

编辑：

对getMetrics的更新可以是：

^{pr2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章