我需要使用pyspark修改嵌套JSON的值,并保持模式完整,模式应该与原始JSON相同,只有少数字段的值需要修改
下面是我的json示例:
我想修改以下字段的值:
源JSON:
{
"references": [
{
"TAG1": VALUE1,
"TAG2": "VALUE2",
"TAG3": VALUE3,
"TAG4": "VALUE4",
"account": [
{
"ID": A_VALUE1,
"BANK_ID": A_VALUE2,
"ADDR1": "A_VALUE3",
"ADDR2": "A_VALUE4"
}
],
"holder": {
"ID": H_VALUE1,
"BANK_ID": H_VALUE2,
"ADDR1": "H_VALUE3",
"ADDR2": "H_VALUE4"
}
},
{
"TAG1": VALUE1,
"TAG2": "VALUE2",
"TAG3": VALUE3,
"TAG4": "VALUE4",
"account": [
{
"ID": A_VALUE1,
"BANK_ID": A_VALUE2,
"ADDR1": "A_VALUE3",
"ADDR2": "A_VALUE4"
}
],
"holder": {
"ID": H_VALUE1,
"BANK_ID": H_VALUE2,
"ADDR1": "H_VALUE3",
"ADDR2": "H_VALUE4"
}
}
]
}
输出JSON:
{
"references": [
{
"TAG1": NEW_VALUE1,
"TAG2": "NEW_VALUE2",
"TAG3": VALUE3,
"TAG4": "VALUE4",
"account": [
{
"ID": A_VALUE1,
"BANK_ID": A_VALUE2,
"ADDR1": "NEW_ADDR1",
"ADDR2": "NEW_ADDR2"
}
],
"holder": {
"ID": H_VALUE1,
"BANK_ID": H_VALUE2,
"ADDR1": "NEW_ADDR1",
"ADDR2": "NEW_ADDR2"
}
},
{
"TAG1": NEW_VALUE1,
"TAG2": "NEW_VALUE2",
"TAG3": VALUE3,
"TAG4": "VALUE4",
"account": [
{
"ID": A_VALUE1,
"BANK_ID": A_VALUE2,
"ADDR1": "NEW_ADDR1",
"ADDR2": "NEW_ADDR2"
}
],
"holder": {
"ID": H_VALUE1,
"BANK_ID": H_VALUE2,
"ADDR1": "NEW_ADDR1",
"ADDR2": "NEW_ADDR2"
}
}
]
}
可以使用
transform
函数更新references
数组列的结构元素:相关问题 更多 >
编程相关推荐