我创建了一个dataframe ready,并用VectorAssembler
对其进行了转换,以便与ML
库一起使用:
from pyspark.ml import Pipeline
from pyspark.ml.feature import VectorAssembler, StringIndexer
from pyspark.ml.classification import DecisionTreeClassifier
target_index = StringIndexer(inputCol="target", outputCol="target_idx").fit(df)
assembler = VectorAssembler(
inputCols=[
x for x in df.columns if x not in ['target', 'ident_1', 'id_l', 'target_idx']
],
outputCol='features'
)
cl = DecisionTreeClassifier(labelCol='target_idx', featuresCol='features')
pipe = Pipeline(stages=[target_index, assembler, cl])
model = pipe.fit(df_train)
df_transformed = model.stages[1]
现在我想将转换后的数据集写入ARFF
文件。是有没有办法写一个已经由VectorAssembler
转换成ARFF
格式的PySpark数据帧?在
目前没有回答
相关问题 更多 >
编程相关推荐