Pyspark中的JSON解析

select split(element,':')[0] key, e.value, id from ( SELECT regexp_replace(e.element,'^\\{|"| *\\[|\\]|\\}$','') element, t.id FROM input_df t lateral view explode(split(my_col,'(?<=\\]) *, *(?=\\")')) e as element )s lateral view explode(split(split(element,':')[1],',')) e as value

1条回答

网友

1楼 · 发布于 2024-09-29 02:24:15

这里的技巧是将JSON列强制转换为map，并分解所有值，直到将其平坦化

import pyspark.sql.functions as f


input_df = spark.createDataFrame([
  ['A123', '{"XXX": ["123","456"],"YYY": ["246","135"]}'],
  ['B222', '{"ZZZ":["333"]}']
], schema='id string, my_col string')

output_df = (input_df
             .withColumn('entries', f.from_json('my_col', 'map<string, array<string>>'))
             .select('id', f.explode('entries'))
             .select('ID', 'Key', f.explode('value').alias('Value')))

output_df.show(truncate=False)
+  + -+  -+
|ID  |Key|Value|
+  + -+  -+
|A123|XXX|123  |
|A123|XXX|456  |
|A123|YYY|246  |
|A123|YYY|135  |
|B222|ZZZ|333  |
+  + -+  -+

相关问题更多 >

编程相关推荐

热门问题

热门文章

Pyspark中的JSON解析

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >