只更改一个列分隔符

1条回答

网友

1楼 · 发布于 2024-09-25 00:36:08

这会得到所需的输入：

a[0].split('"')[1].replace(",", "#")

但有些东西告诉我这不是很有用/一般。你知道吗

但无论如何，解决这类问题的方法可能会涉及以下两种字符串/列表方法：split和replace

https://docs.python.org/3/library/stdtypes.html#str.split

https://docs.python.org/3/library/stdtypes.html#str.replace

更新

因此，如果需要使用spark RDD，可以首先使用字符串列表（还不是csv）创建RDD

>>> rdd = sc.parallelize(a)
>>> rdd.take(1)
['0,Italy,"Aromas include tropical fruit, broom, brimstone and dried herb. The palate isnt overly expressive, offering unripened apple, citrus and dried sage alongside brisk acidity.",Vulk\xc3\xa0 Bianco,87,,Sicily & Sardinia,Etna,,Kerin O\xe2\x80\x99Keefe,@kerinokeefe,Nicosia 2013 Vulk\xc3\xa0 Bianco (Etna),White Blend,Nicosia']
>>> processed_rdd = rdd.map(lambda row: row.split('"')[0] + row.split('"')[1].replace(",", "#") + row.split('"')[2])
>>> processed_rdd.take(1)
['0,Italy,Aromas include tropical fruit# broom# brimstone and dried herb. The palate isnt overly expressive# offering unripened apple# citrus and dried sage alongside brisk acidity.,Vulk\xc3\xa0 Bianco,87,,Sicily & Sardinia,Etna,,Kerin O\xe2\x80\x99Keefe,@kerinokeefe,Nicosia 2013 Vulk\xc3\xa0 Bianco (Etna),White Blend,Nicosia']

我有几个假设，因为您只提供了一个示例行。你知道吗

这些假设是关于这个双引号字符串" "的存在，它是需要替换的带有逗号的列。你知道吗

此外，我假设在其他任何列中都没有"。你知道吗

我还假设这个列在处理之后不需要这些"。你知道吗

解释

rdd方法map将函数映射到RDD中的每一行，并且map获取的lambda返回新行。所以在这里，我将这个替换命令链映射到RDD中的每一行（然后在示例中，我take一行）

更新

解释

相关问题更多 >

编程相关推荐

热门问题

热门文章

只更改一个列分隔符

更新

解释

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >