PySpark如何从TrainValidationSplit获取精度/召回率/ROC?

2024-09-27 20:15:43 发布

您现在位置:Python中文网/ 问答频道 /正文

我目前评估LinearSVC不同参数并获得最佳参数的方法:

tokenizer = Tokenizer(inputCol="Text", outputCol="words")
wordsData = tokenizer.transform(df)

hashingTF = HashingTF(inputCol="words", outputCol="rawFeatures")
featurizedData = hashingTF.transform(wordsData)

idf = IDF(inputCol="rawFeatures", outputCol="features")
idfModel = idf.fit(featurizedData)

LSVC = LinearSVC()

rescaledData = idfModel.transform(featurizedData)

paramGrid = ParamGridBuilder()\
                            .addGrid(LSVC.maxIter, [1])\
                            .addGrid(LSVC.regParam, [0.001, 10.0])\
                            .build()

crossval = TrainValidationSplit(estimator=LSVC,
                                estimatorParamMaps=paramGrid,
                                evaluator=MulticlassClassificationEvaluator(metricName="weightedPrecision"),
                                testRatio=0.01)

cvModel = crossval.fit(rescaledData.select("KA", "features").selectExpr("KA as label", "features as features"))

bestModel = cvModel.bestModel

现在我想得到ML的基本参数(比如precisionrecall)等,如何得到这些参数?在


Tags: 参数transformtokenizerfeatureswordsidflinearsvcrawfeatures
1条回答
网友
1楼 · 发布于 2024-09-27 20:15:43
You can try this

from pyspark.mllib.evaluation import MulticlassMetrics


# Instantiate metrics object
metrics = MulticlassMetrics(predictionAndLabels)

# Overall statistics
precision = metrics.precision()
recall = metrics.recall()
f1Score = metrics.fMeasure()
print("Summary Stats")
print("Precision = %s" % precision)
print("Recall = %s" % recall)
print("F1 Score = %s" % f1Score)

您可以查看此链接以获取更多信息

https://spark.apache.org/docs/2.1.0/mllib-evaluation-metrics.html

相关问题 更多 >

    热门问题