数据帧Python中的正则表达式

2024-09-28 22:41:27 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试从数据帧中提取名称

df.['target_name'].head()

3                             Minnie
4     Albert [unclear]Gles[/unclear]
5      Eliza [unclear]Gles[/unclear]
6                      John Slaltery
7     [unclear]P.[/unclear] Slaltery
23     ? Stewart
34                     John Maddison
35                     Herbert Olney
36                   William Iverach
37               [unclear][/unclear]
38                  Peter Blacksmith
39                    William Oliver
40                             Emily
Name: target_name, dtype: object

这是输出。我们只想去掉不必要的字符并获取名称。 这就是我所做的:

import re
df['target_name'] = df['target_name'].astype(str) #converting it into a string. 

我试着使用这两种方法,但它们都给了我相同的输出,即Nan

df['target_name'] = df['target_name'].str.extract('([a-zA-Z ]+)', expand=False).str.strip()
df['target_name3'] = df['target_name'].str.replace(r'\([^)]*\)', '').str.strip()

Tags: 数据name名称targetdfjohnheadstrip
1条回答
网友
1楼 · 发布于 2024-09-28 22:41:27

这似乎对我有用

import pandas as pd
import re
target_name = ["Minnie", "Albert [unclear]Gles[/unclear]", 
               "Eliza [unclear]Gles[/unclear]", 
               "[unclear]P.[/unclear] Slaltery",  "? Stewart"]
df = pd.DataFrame(target_name, columns = ['target_name'])
df['target_name'] = df['target_name'].astype('str').str.replace(r'\/|\?','').str.replace('\[[a-z]+\]','').str.strip()

相关问题 更多 >