将structtype中的所有字段转换为数组

2024-10-02 00:35:00 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个超过1000个字段的structtype,每个字段类型都是一个字符串。在

root
 |-- mac: string (nullable = true)
 |-- kv: struct (nullable = true)
 |    |-- FTP_SERVER_ANAUTHORIZED_FEAT_B64: string (nullable = true)
 |    |-- FTP_SERVER_ANAUTHORIZED_FEAT_CODE: string (nullable = true)
 |    |-- FTP_SERVER_ANAUTHORIZED_HELP_B64: string (nullable = true)
 |    |-- FTP_SERVER_ANAUTHORIZED_HELP_CODE: string (nullable = true)
 |    |-- FTP_SERVER_ANAUTHORIZED_SYST_B64: string (nullable = true)
 |    |-- FTP_SERVER_ANAUTHORIZED_SYST_CODE: string (nullable = true)
 |    |-- FTP_SERVER_HELLO_B64: string (nullable = true)
 |    |-- FTP_STATUS_HELLO_CODE: string (nullable = true)
 |    |-- HTML_LOGIN_FORM_ACTION_0: string (nullable = true)
 |    |-- HTML_LOGIN_FORM_DETECTION_0: string (nullable = true)
 |    |-- HTML_LOGIN_FORM_INPUT_PASSWORD_NAME_0: string (nullable = true)
 |    |-- HTML_LOGIN_FORM_INPUT_TEXT_NAME_0: string (nullable = true)
 |    |-- HTML_LOGIN_FORM_METHOD_0: string (nullable = true)
 |    |-- HTML_REDIRECT_TYPE_0: string (nullable = true)

我只想选择非空的字段,以及非空字段的一些标识符。有没有把这个结构转换成数组而不显式地引用每个元素?在


Tags: formtruehellostringserverhtmlhelpftp
1条回答
网友
1楼 · 发布于 2024-10-02 00:35:00

我会使用udf

from pyspark.sql.types import *
from pyspark.sql.functions import udf

as_array = udf(
    lambda arr: [x for x in arr if x is not None], 
    ArrayType(StringType()))


df.withColumn("arr", as_array(df["kv"])))

相关问题 更多 >

    热门问题