PySpark:AttributeError:“PipelineModel”对象没有属性“ClusterCenter”

2024-09-29 19:24:38 发布

您现在位置:Python中文网/ 问答频道 /正文

我用Pypsark创建了一个kmeans算法。现在,我还要提取集群中心。我如何将其包括在管道中?这是到目前为止我所拥有的代码,但它向我抛出了一个错误'AttributeError:'PipelineModel'对象没有属性'ClusterCenter'。怎么能修好呢

#### model K-Means ###

from pyspark.ml.clustering import KMeans, KMeansModel

kmeans = KMeans() \
          .setK(3) \
          .setFeaturesCol("scaledFeatures")\
          .setPredictionCol("cluster")

# Chain indexer and tree in a Pipeline
pipeline = Pipeline(stages=[kmeans])

model = pipeline.fit(matrix_normalized)

cluster = model.transform(matrix_normalized)

#get cluster centers
centers = model.clusterCenters()

Tags: 代码算法model管道pipeline集群中心matrix
1条回答
网友
1楼 · 发布于 2024-09-29 19:24:38

虚拟数据

from pyspark.ml.linalg import Vectors
from pyspark.ml.clustering import KMeans, KMeansModel
from pyspark.ml.pipeline import Pipeline


data = [(Vectors.dense([0.0, 0.0]),), (Vectors.dense([1.0, 1.0]),),
        (Vectors.dense([9.0, 8.0]),), (Vectors.dense([8.0, 9.0]),)]
matrix_normalized = spark.createDataFrame(data, ["scaledFeatures"])

你的代码

kmeans = KMeans() \
          .setK(3) \
          .setFeaturesCol("scaledFeatures")\
          .setPredictionCol("cluster")

# Chain indexer and tree in a Pipeline
pipeline = Pipeline(stages=[kmeans])

model = pipeline.fit(matrix_normalized)

cluster = model.transform(matrix_normalized)

换最后一行就行了

model.stages[0].clusterCenters()

[array([0.5, 0.5]), array([8., 9.]), array([9., 8.])]

相关问题 更多 >

    热门问题