将csv dict列转换为pysp行

id,cbgs,value sg:bd1f26e681264baaa4b44083891c886a,060372623011,166 sg:bd1f26e681264baaa4b44083891c886a,060372655203,70 sg:bd1f26e681264baaa4b44083891c886a,060377019021,34 sg:04c7f777f01c4c75bbd9e43180ce811f,060372073012,7

fifa_df.withColumn("cbgs", F.from_json("cbgs", T.MapType(T.StringType(), T.IntegerType()))).select("id", F.explode(["visitor_home_cbgs"]).alias('cbgs', 'value')).show() +------------------+----+-----+ |safegraph_place_id|cbgs|value| +------------------+----+-----+ +------------------+----+-----+

2条回答

网友

1楼 · 编辑于 2024-05-02 04:28:38

首先需要将json解析为Map<String, Integer>，然后分解映射。你可以这样做：

import pyspark.sql.types as T
import pyspark.sql.functions as F

...

df2.withColumn("cbgs", F.from_json("cbgs", T.MapType(T.StringType(), T.IntegerType()))).select("id", F.explode("cbgs").alias('cbgs', 'value')).show()

网友

2楼 · 编辑于 2024-05-02 04:28:38

以下是我所遵循的。这只涉及字符串处理操作，而不涉及复杂的数据类型处理。在

以escape选项读取源csv文件"df=spark.read.format('csv').option('header','True').option('escape','"')

|id                                 |cbgs                                                    |
+                 -+                            +
|sg:bd1f26e681264baaa4b44083891c886a|{"060372623011":166,"060372655203":70,"060377019021":34}|
|sg:04c7f777f01c4c75bbd9e43180ce811f|{"060372073012":7}                                      |
+                 -+                            +

第二列作为字符串而不是映射加载。现在split列 df=df.withColumn('cbgs',split(df['cbgs'],','))

^{pr2}$

3.稍后，爆炸。在

df=df.withColumn('cbgs',explode(df['cbgs']))

+                 -+         -+
|id                                 |cbgs               |
+                 -+         -+
|sg:bd1f26e681264baaa4b44083891c886a|{"060372623011":166|
|sg:bd1f26e681264baaa4b44083891c886a|"060372655203":70  |
|sg:bd1f26e681264baaa4b44083891c886a|"060377019021":34} |
|sg:04c7f777f01c4c75bbd9e43180ce811f|{"060372073012":7} |
+                 -+         -+

使用regex从cbgs列中提取值 ^{cd8}

+                 -+      +  -+
|id                                 |cbgs        |value|
+                 -+      +  -+
|sg:bd1f26e681264baaa4b44083891c886a|060372623011|166  |
|sg:bd1f26e681264baaa4b44083891c886a|060372655203|70   |
|sg:bd1f26e681264baaa4b44083891c886a|060377019021|34   |
|sg:04c7f777f01c4c75bbd9e43180ce811f|060372073012|7    |
+                 -+      +  -+

写入csv。在

相关问题更多 >

编程相关推荐

热门问题

热门文章