PySpark:管道失败,出现“java.util.NoTouchElementException:Param generateMissingLabels不存在”

2024-09-30 12:26:13 发布

您现在位置:Python中文网/ 问答频道 /正文

转换使用StringIndexerOneHotEncoderVectorAssemblerLightGBMClassifier的ML管道时,它与

"java.util.NoSuchElementException: Param generateMissingLabels does not exist."

我真的不知道为什么会这样

我正在使用这段代码构建一个LightGBM模型来估计用户的级别

indexers = [ft.StringIndexer(inputCol=col, outputCol='{0}_indexed'.format(col)) 
            for col in ['native_province_name', 'horoscope']
           ]
encoders = [ft.OneHotEncoder(inputCol=indexer.getOutputCol(), 
                             outputCol='{0}_encoded'.format(indexer.getOutputCol()))
           for indexer in indexers
           ]

assembler = ft.VectorAssembler(
inputCols=['gender', 'age'] + [encoder.getOutputCol() for encoder in encoders],
                                    outputCol='features')

lgb = LightGBMClassifier(
                featuresCol='features',
                labelCol='level')

pipeline = Pipeline(stages = indexers + encoders + [assembler] + [lgb])

full_df_1026_train, full_df_1026_test = full_df_1026 \
.randomSplit([0.7, 0.3], seed=1998)

lgbmodel_onlywithage = pipeline.fit(full_df_1026_train)
test_lgbmodel_onlywithage = lgbmodel_onlywithage.transform(full_df_1026_test)

获取以下错误:

Py4JJavaError: An error occurred while calling o1292.getParam.
: java.util.NoSuchElementException: Param generateMissingLabels does not exist.
    at org.apache.spark.ml.param.Params$$anonfun$getParam$2.apply(params.scala:601)
    at org.apache.spark.ml.param.Params$$anonfun$getParam$2.apply(params.scala:601)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.ml.param.Params$class.getParam(params.scala:600)
    at org.apache.spark.ml.PipelineStage.getParam(Pipeline.scala:42)
    at sun.reflect.GeneratedMethodAccessor77.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:280)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:214)
    at java.lang.Thread.run(Thread.java:745)

错误指向这一行:

lgbmodel_onlywithage = pipeline.fit(full_df_1026_train)

有人能就此提出建议吗?谢谢


Tags: orgdfapachejavamlatfullspark

热门问题