筛选数据帧中的特殊字符

2024-10-03 06:19:06 发布

男 | 程序猿一只，喜欢编程写python代码。

我有以下名为data的数据帧：

    metrics    artists

0    0.21    ['ZhanÃ©']
2    0.14    ['Mose Allison']
3    0.87    ['水柳仙']
4    0.25    ['Shel Silverstein']

“艺术家”栏的一些记录有特殊字符，我想用具有特殊字符的记录制作另一个df，即以下输出：

数据：

     metrics    artists

0    0.14    ['Mose Allison']
1    0.25    ['Shel Silverstein']

数据2：

     metrics    artists

0    0.21    ['ZhanÃ©']
1    0.14    ['水柳仙']

使用：

 data2=data.artists[data.artists.str.contains("[^a-zA-Z0-9]")]

但是我得到了原始的df

我还尝试了：

data2 = []
for x in data['artists']:
    if x is not "[^a-zA-Z0-9 ]":
         data2[x]=data[x]
    print(data2)

但它给了我一个错误：

KeyError: "['ZhanÃ©']"

以及：

if x is "[^ a-zA-Z0-9]"

返回空记录

Tags：数据 df data 记录 metrics allison za z0

1条回答

网友

1楼 · 发布于 2024-10-03 06:19:06

use:
data2=data.artists[data.artists.str.contains("[^a-zA-Z0-9]")]
but I get the original df,

您在“[^a-zA-Z0-9]”中缺少一个空格，这就是您获取原始df的原因。在Jupyter笔记本中使用Python3进行测试