AttributeError:“PipelineModel”对象没有属性“fitMultiple”

from pyspark.ml.evaluation import BinaryClassificationEvaluator from pyspark.ml.classification import RandomForestClassifier from pyspark.ml.feature import VectorAssembler from pyspark.ml import Pipeline # Create a spark RandomForestClassifier using all default parameters. # Create a training, and testing df training_df, testing_df = raw_data_df.randomSplit([0.6, 0.4]) # build a pipeline for analysis va = VectorAssembler().setInputCols(training_df.columns[0:110:]).setOutputCol('features') # featuresCol="features" rf = RandomForestClassifier(labelCol="quality") # Train the model and calculate the AUC using a BinaryClassificationEvaluator rf_pipeline = Pipeline(stages=[va, rf]).fit(training_df) bce = BinaryClassificationEvaluator(labelCol="quality") # Check AUC before tuning bce.evaluate(rf_pipeline.transform(testing_df)) from pyspark.ml.tuning import CrossValidator, ParamGridBuilder paramGrid = ParamGridBuilder().build() crossValidator = CrossValidator(estimator=rf_pipeline, estimatorParamMaps=paramGrid, evaluator=bce, numFolds=3) model = crossValidator.fit(training_df)

1条回答

网友

1楼 · 发布于 2024-06-02 23:46:58

CrossValidator估计器采用管道对象，而不是管道模型

请检查此示例以供参考- https://github.com/apache/spark/blob/master/examples/src/main/python/ml/cross_validator.py

您的代码应该修改如下

创建管道

rf_pipe = Pipeline(stages=[va, rf])

将该管道用作crossvalidator中的估计器

crossValidator = CrossValidator(estimator=rf_pipe, 
                          estimatorParamMaps=paramGrid, 
                          evaluator=bce, 
                          numFolds=3)

所有-

....

# Train the model and calculate the AUC using a BinaryClassificationEvaluator
rf_pipe = Pipeline(stages=[va, rf])
rf_pipeline = rf_pipe.fit(training_df)

...

crossValidator = CrossValidator(estimator=**rf_pipe**, 
                          estimatorParamMaps=paramGrid, 
                          evaluator=bce, 
                          numFolds=3)

model = crossValidator.fit(training_df)

您的代码应该修改如下

相关问题更多 >

编程相关推荐

热门问题

热门文章