我正试图在pyspark中分割一个数据帧 这是我掌握的数据
df = sc.parallelize([[1, 'Foo|10'], [2, 'Bar|11'], [3,'Car|12']]).toDF(['Key', 'Value'])
df = df.withColumn('Splitted', split(df['Value'], '|')[0])
我得到了
+-----+---------+-----+
|Key|Value|Splitted |
+-----+---------+-----+
| 1| Food|10| F|
| 2| Bar|11 | B|
| 3| Caring 12| C|
+-----+---------+-----+
但我想
+-----+---------+-----+
|Key | Value|Splitted|
+-----+---------+-----+
| 1| 10| Food |
| 2| 11| Bar |
| 3| 12|Caring |
+-----+---------+-----+
有人能告诉我我做错了什么吗?
What if i have a unique situation like this?
df = sc.parallelize([[1, 'Foo|10|we'], [2, 'Bar|11|we'], [3,'Car|12|we']]).toDF(['Key', 'Value'])
+---+---------+
|Key| Value|
+---+---------+
| 1|Foo|10|we|
| 2|Bar|11|we|
| 3|Car|12|we|
+---+---------+
目前没有回答
相关问题 更多 >
编程相关推荐