如何在pyspark dataframe中将一列中的字典列表拆分为两列?

2024-10-16 20:51:44 发布

您现在位置:Python中文网/ 问答频道 /正文

enter image description here我想将上面spark数据框的filteredaddress列拆分为两个新列,分别是Flag和Address:

customer_id|pincode|filteredaddress|                                                              Flag| Address
1000045801 |121005 |[{'flag':'0', 'address':'House number 172, Parvatiya Colony Part-2 , N.I.T'}]
1000045801 |121005 |[{'flag':'1', 'address':'House number 172, Parvatiya Colony Part-2 , N.I.T'}]
1000045801 |121005 |[{'flag':'1', 'address':'House number 172, Parvatiya Colony Part-2 , N.I.T'}]

谁能告诉我怎么做


Tags: 数据idnumberaddresscustomersparkhouseflag
1条回答
网友
1楼 · 发布于 2024-10-16 20:51:44

您可以使用以下键从filteredaddress映射列获取值:

df2 = df.selectExpr(
    'customer_id', 'pincode',
    "filteredaddress['flag'] as flag", "filteredaddress['address'] as address"
)

访问地图值的其他方法有:

import pyspark.sql.functions as F

df.select(
    'customer_id', 'pincode',
    F.col('filteredaddress')['flag'],
    F.col('filteredaddress')['address']
)

# or, more simply

df.select(
    'customer_id', 'pincode',
    'filteredaddress.flag',
    'filteredaddress.address'
)

相关问题 更多 >