如何用MLlib在Spark上生成（原始标签，预测标签）的元组？

2024-10-02 18:21:55 发布

您现在位置：Python中文网/ 问答频道 /正文

8534

网友

男 | 程序猿一只，喜欢编程写python代码。

我试着用我从MLlib在Spark上得到的模型来做预测。目标是生成（orinalLabelInData，predictedLabel）的元组。然后这些元组就可以用于模型评估。实现这一目标的最佳方法是什么？谢谢。在

假设parsedTrainData是LabeledPoint的RDD

from pyspark.mllib.regression import LabeledPoint
from pyspark.mllib.tree import DecisionTree, DecisionTreeModel
from pyspark.mllib.util import MLUtils

parsedTrainData = sc.parallelize([LabeledPoint(1.0, [11.0,-12.0,23.0]), 
                                  LabeledPoint(3.0, [-1.0,12.0,-23.0])])

model = DecisionTree.trainClassifier(parsedTrainData, numClasses=7,
categoricalFeaturesInfo={}, impurity='gini', maxDepth=8, maxBins=32)

model.predict(parsedTrainData.map(lambda x: x.features)).take(1)

这将返回预测，但我不确定如何将每个预测与数据中的原始标签匹配。在

我试过了

^{pr2}$

然而，似乎我把模型发送给工人的方式在这里并不是一件有效的事情

/spark140/python/pyspark/context.pyc in __getnewargs__(self)
    250         # This method is called when attempting to pickle SparkContext, which is always an error:
    251         raise Exception(
--> 252             "It appears that you are attempting to reference SparkContext from a broadcast "
    253             "variable, action, or transforamtion. SparkContext can only be used on the driver, "
    254             "not in code that it run on workers. For more information, see SPARK-5063."

Exception: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transforamtion. SparkContext can only be used on the driver, not in code that it run on workers. For more information, see SPARK-5063.

Tags： to in from 模型 import 目标 that on

1条回答

网友

1楼 · 发布于 2024-10-02 18:21:55

好吧，根据official documentation你可以简单地压缩预测和标签如下：

predictions = model.predict(parsedTrainData.map(lambda x: x.features))
labelsAndPredictions = parsedTrainData.map(lambda x: x.label).zip(predictions)

如何用MLlib在Spark上生成（原始标签，预测标签）的元组？

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何用MLlib在Spark上生成（原始标签，预测标签）的元组？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >