删除列中unicode字符串的字符串化列表

2024-09-30 04:33:38 发布

您现在位置:Python中文网/ 问答频道 /正文

我在df中有一列,如下所示:

pd.DataFrame(["[u'one_element']", "[u'two_elememts', u'two_elements']", "[u'three_elements', u'three_elements', u'three_elements']"])

    0
0   [u'one_element']
1   [u'two_elememts', u'two_elements']
2   [u'three_elements', u'three_elements', u'three_elements']

这些元素是字符串:

type(df[0].iloc[2]) == str

最终结果应该如下所示:

    0
0   one_element
1   two_elememts, two_elements
2   three_elements, three_elements, three_elements

我试过:

df[column] = df[column].map(lambda x: x.lstrip('[u').rstrip(']').replace("u'","").replace("'",""))

但是当你有很多行的时候,这显然是很慢的

有更好的方法吗?df有许多不同类型的列:字符串、整数、浮点

谢谢


Tags: 字符串元素dataframedftypecolumnelementselement
3条回答

使用ast模块

import pandas as pd
import ast
df = pd.DataFrame(["[u'one_element']", "[u'two_elememts', u'two_elements']", "[u'three_elements', u'three_elements', u'three_elements']"])
print(df[0].apply(lambda x: ", ".join(ast.literal_eval(x))))

输出:

0                                       one_element
1                        two_elememts, two_elements
2    three_elements, three_elements, three_elements
Name: 0, dtype: object

您可以使用正则表达式和strip,即

df[0] = df[0].str.strip("[]").str.replace("u'|'",'')

0                                       one_element
1                        two_elememts, two_elements
2    three_elements, three_elements, three_elements
Name: 0, dtype: object

您不需要映射,可以使用熊猫系列的str属性:

(df[0].str.lstrip('[u')
           .str.rstrip(']')
           .str.replace("u'","")
           .str.replace("'","")))

获得相同的结果,但不使用map

0                                       one_element
1                        two_elememts, two_elements
2    three_elements, three_elements, three_elements
Name: 0, dtype: object

相关问题 更多 >

    热门问题