回答此问题可获得 20 贡献值,回答如果被采纳可获得 50 分。
<p>我试图在Spark的一个随机林中运行交叉验证。在</p>
<pre><code>from pyspark.ml import Pipeline
from pyspark.ml.classification import RandomForestClassifier
from pyspark.ml.tuning import CrossValidator, ParamGridBuilder
from pyspark.ml.evaluation import MulticlassClassificationEvaluator
data = nds.sc.parallelize([
LabeledPoint(0.0, [0,402,6,0]),
LabeledPoint(0.0, [3,500,3,0]),
LabeledPoint(1.0, [1,590,1,1]),
LabeledPoint(1.0, [3,328,5,0]),
LabeledPoint(1.0, [4,351,4,0]),
LabeledPoint(0.0, [2,372,2,0]),
LabeledPoint(0.0, [4,302,5,0]),
LabeledPoint(1.0, [1,387,2,0]),
LabeledPoint(1.0, [1,419,3,0]),
LabeledPoint(0.0, [1,370,5,0]),
LabeledPoint(0.0, [1,410,4,0]),
LabeledPoint(0.0, [2,509,7,1]),
LabeledPoint(0.0, [1,307,5,0]),
LabeledPoint(0.0, [0,424,4,1]),
LabeledPoint(0.0, [1,509,2,1]),
LabeledPoint(1.0, [3,361,4,0]),
])
train=data.toDF(['label','features'])
numfolds =2
rf = RandomForestClassifier(labelCol="label", featuresCol="features")
evaluator = MulticlassClassificationEvaluator()
paramGrid = ParamGridBuilder().addGrid(rf.maxDepth,
[4,8,10]).addGrid(rf.impurity, ['entropy','gini']).addGrid(rf.featureSubsetStrategy, [6,8,10]).build()
pipeline = Pipeline(stages=[rf])
crossval = CrossValidator(
estimator=pipeline,
estimatorParamMaps=paramGrid,
evaluator=evaluator,
numFolds= numfolds)
model = crossval.fit(train)
</code></pre>
<p>我得到以下错误</p>
^{pr2}$
<p>似乎paramGrid没有将我的输入作为列表读取。是否有其他格式或解决方法。任何帮助都将不胜感激。在</p>