使用Pandas在Python中进行数据预处理

网友
1楼 · 编辑于 2024-06-28 18:47:46

您可以使用有用的正则表达式python包re。这就是解决办法
import pandas as pd import re
生成测试数据
data = [ [1, '[[str1],[str2],[str3]]'], [2, '[[str4],[str5]]'], [3, '[[str1]]'], [4, '[[str8]]'], [5, '[[str9]]'], [6, '[[str4]]'] ]
将数据转换为数据帧
df = pd.DataFrame(data, columns = ['id', 'value']) print(df)
从“值”列中删除“[”，“]”
df['value']=df.apply(lambda x: re.sub("[\[\]]", "", x['value']),axis=1) print(df)

网友
2楼 · 编辑于 2024-06-28 18:47:46

看起来你有一系列的清单。您可以尝试取消并加入：
df['value'] = df['value'].apply(lambda x: ','.join([e for l in x for e in l]))
或：
from itertools import chain df['value'] = df['value'].apply(lambda x: ','.join(chain.from_iterable(x)))
NB。如果您收到错误，请提供它和列的类型（df.dtypes）

网友
3楼 · 编辑于 2024-06-28 18:47:46

正如我所看到的，您的数据和采样是相同的：

样本数据：

df = pd.DataFrame({'id':[1,2,3,4,5,6], 'value':['[[str1],[str2],[str3]]', '[[str4],[str5]]', '[[str1]]',  '[[str8]]', '[[str9]]', '[[str4]]']})
print(df)
   id                   value
0   1  [[str1],[str2],[str3]]
1   2         [[str4],[str5]]
2   3                [[str1]]
3   4                [[str8]]
4   5                [[str9]]
5   6                [[str4]]

结果:

df['value'] = df['value'].str.replace('[', '').astype(str).str.replace(']', '')
print(df)
   id           value
0   1  str1,str2,str3
1   2       str4,str5
2   3            str1
3   4            str8
4   5            str9
5   6            str4

注意：如错误代码所示AttributeError: Can only use .str accessor with string values，这意味着它没有将其视为str，因此您可以通过astype(str)将其强制转换为str，然后执行替换操作

样本数据：

结果:

相关问题更多 >

编程相关推荐

热门问题

热门文章