从Spark中管道内的StringIndexer阶段获取标签（pyspark）

indexers = [StringIndexer(inputCol=column, outputCol=column + '_index').setHandleInvalid('skip') for column in list(set(data_frame.columns) - ignore_columns)] pipeline = Pipeline(stages=indexers) new_data_frame = pipeline.fit(data_frame).transform(data_frame)

1条回答

网友

1楼 · 发布于 2024-09-27 07:30:45

示例数据和Pipeline：

from pyspark.ml.feature import StringIndexer, StringIndexerModel

df = spark.createDataFrame([("a", "foo"), ("b", "bar")], ("x1", "x2"))

pipeline = Pipeline(stages=[
    StringIndexer(inputCol=c, outputCol='{}_index'.format(c))
    for c in df.columns
])

model = pipeline.fit(df)

摘录自stages：

^{pr2}$

{'x1_index': ['a', 'b'], 'x2_index': ['foo', 'bar']}

从已转换的DataFrame的元数据：

indexed = model.transform(df)

{c.name: c.metadata["ml_attr"]["vals"]
for c in indexed.schema.fields if c.name.endswith("_index")}

{'x1_index': ['a', 'b'], 'x2_index': ['foo', 'bar']}

编程相关推荐

java IntelliJ IDEA CreativeProcess错误=193，%1不是有效的Win32应用程序
在java中返回多个值（字符串和数组）
我们可以使用java驱动程序。在pom类中查找数据？
java是处理请求后数据的有效方法
用于小文件的java音频缓存安卓 studio
使用Java exec的postgresql额外psql命令行参数
java导入语句代码错误
使用服务上传java Android HTTPS文件（从HTTP转换为HTTPS）
启动配置服务器组织时发生java Microservice错误。springframework。靴子上下文财产。绑定绑定结果
swing Java:无法在JFrame中显示图像

相关问题更多 >

编程相关推荐

热门问题

热门文章

从Spark中管道内的StringIndexer阶段获取标签（pyspark）

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >