如何将字符串数组转换为带条件的结构数组

transform_expr = """ transform(split(_c0, '[|]'), (x, i) -> struct( IF(x like '%=%', substring_index(x, '=', 1), concat('_c0', i+1)), substring_index(x, '=', -1) ) ) """ df = df.select("_c0", explode(map_from_entries(expr(transform_expr))).alias("col_name", "col_value")).groupby("_c0").pivot('col_name').agg(first('col_value')).drop("_c0")

1条回答

网友

1楼 · 发布于 2024-09-28 01:28:46

您可以在列表中包含所需的列，并使用它过滤转换后的数组：

column_list = ["clm1", "clm2", "clm3", "clm4", "clm6", "clm7", "clm8"]

现在使用^{}函数在转换步骤后添加此过滤器：

column_filter = ','.join(f"'{c}'" for c in column_list)

transform_expr = f"""
            filter(transform(split(_c0, '[|]'), (x, i) -> 
                               struct(
                                     IF(x like '%=%', substring_index(x, '=', 1), concat('clm', i+1)) as name, 
                                     substring_index(x, '=', -1) as value
                                     )
                    ), x -> x.name in ({column_filter}))
            """

这将过滤掉列表中不存在的所有列

最后，使用简单的选择表达式将缺少的列添加为null：

df = df.select("_c0",  explode(map_from_entries(expr(transform_expr))).alias("col_name", "col_value")).groupby("_c0").pivot('col_name').agg(first('col_value')).drop("_c0")

## add missing columns as nulls
final_columns = [col(c).alias(c) if c in df.columns else lit(None).alias(c) for c in column_list]

df.select(*final_columns).show()

#+  +  +  +  +  +  +  +
#|clm1|clm2|clm3|clm4|clm6|clm7|clm8|
#+  +  +  +  +  +  +  +
#|   a|   b|   c|   9|  60|  23|null|
#|   a|   b|   c|   1|null|null|null|
#+  +  +  +  +  +  +  +

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何将字符串数组转换为带条件的结构数组

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >