如何用正则表达式提取Pandas的字符串？

| | sid | Hobby (times per month) | |-----+-------+-------------------------| | 0 | 3 | swimming(4) | |-----+-------+-------------------------| | 1 | 4 | hiking (1 ) | |-----+-------+-------------------------| | 2 | 2 | running ( 12 ) | |-----+-------+-------------------------| | 3 | 5 | fishing ( 2 ) |

| | sid | Hobby (times per month) | |-----+-------+-------------------------| | 0 | 3 | swimming | |-----+-------+-------------------------| | 1 | 4 | hiking | |-----+-------+-------------------------| | 2 | 2 | running | |-----+-------+-------------------------| | 3 | 5 | fishing |

3条回答

网友

1楼 · 编辑于 2024-09-26 18:07:28

要在pandas中实现regex，可以使用熊猫。应用（）：

import re

def remove_brackets(string):
    part = regexp_matcher.findall(string)
    if not part:
        return string
    return part[0]

regexp_matcher = re.compile(r'^([\w]+)[\s]*\([\s]*[\d]*[\s]*\)[\s]*$')
df = pd.DataFrame()
df['string'] = ['swimming(4)', 'swimming(4)', 'swimming(4)']    
df['new_string'] = df['string'].apply(remove_brackets)

网友

2楼 · 编辑于 2024-09-26 18:07:28

例如，如果希望将swimming(4)更改为swimming，可以使用下面的正则表达式：

^([\w]+)[\s]*\([\s]*[\d]*[\s]*\)[\s]*$

演示：https://regex101.com/r/sTO1Q9/1

测试用例：

swimming(4)
hiking   (1 )
running ( 12 )
fishing( 2 )
hiking(1)

匹配：

Match 1
Full match  0-11    `swimming(4)`
Group 1.    0-8 `swimming`
Match 2
Full match  12-25   `hiking   (1 )`
Group 1.    12-18   `hiking`
Match 3
Full match  26-40   `running ( 12 )`
Group 1.    26-33   `running`
Match 4
Full match  41-53   `fishing( 2 )`
Group 1.    41-48   `fishing`
Match 5
Full match  54-64   `hiking(1) `
Group 1.    54-60   `hiking`

网友

3楼 · 编辑于 2024-09-26 18:07:28

您可以使用'str'方法来匹配pandas中的字符串

df.columns = ['sid','Hobby']
df.Hobby = df.Hobby.str.extract(r'(\w*)')

相关问题更多 >

编程相关推荐

热门问题

热门文章