将字典中的值解析为pyspark中的列表

2024-05-17 06:33:56 发布

您现在位置:Python中文网/ 问答频道 /正文

[![enter image description here][1][1]我想转换列中现有的JSON:

namedTags = [{"rid":"ri.compass..ae1","name":"reservoir"},     
{"rid":"ri.compass..ed18","name":cave"},      
{"rid":"ri.compass..c97","name":"staging"}]

我只想把这个名字收集到一个列表中

新列中的预期输出:

['reservoir','cave','staging']

Dataframe看起来像这样 [1] :https://i.stack.imgur.com/X1TAv.png


Tags: nameimagejsonherecompassdescriptionenterri
3条回答
df['col'] = df['some_other_col'].apply(lambda row: [x.get('name',0) for x in row] )

假设列中有对象数组。您可以使用transform从每个对象提取name键:

import pyspark.sql.functions as f

df = spark.createDataFrame([[[{"rid":"ri.compass..ae1","name":"reservoir"},
                              {"rid":"ri.compass..ed18","name":"cave"},
                              {"rid":"ri.compass..c97","name":"staging"}]]], ['namedTags'])


df.withColumn('name', f.expr("transform(namedTags, el -> el.name)")).show()
+--------------------+--------------------+
|           namedTags|                name|
+--------------------+--------------------+
|[[name -> reservo...|[reservoir, cave,...|
+--------------------+--------------------+

您可以使用列表理解。 检查以下代码:

lst = [{"rid":"ri.compass..ae1","name":"reservoir"},{"rid":"ri.compass..ed18","name":"cave"},{"rid":"ri.compass..c97","name":"staging"}]
names = [i['name'] for i in lst]
print(names)

输出:

['reservoir', 'cave', 'staging']

相关问题 更多 >