Python正则表达式立即用组名替换组

2条回答

网友

1楼 · 编辑于 2024-06-29 00:01:00

import re

s = "the blue dog and blue cat wore 7 blue hats 9 days ago"
p = re.compile(r'blue (?P<animal>dog|cat)|(?P<numberBelowSeven>[0-7])|(?P<numberNotSeven>[8-9])')

def callback(matchobj):
    return matchobj.lastgroup

result = p.sub(callback, s)
print(result)

收益率

^{pr2}$

请注意，如果您使用Pandas，则可以使用^{}：

import pandas as pd

def callback(matchobj):
    return matchobj.lastgroup

df = pd.DataFrame({'foo':["the blue dog", "and blue cat wore 7 blue", "hats 9", 
                          "days ago"]})
pat = r'blue (?P<animal>dog|cat)|(?P<numberBelowSeven>[0-7])|(?P<numberNotSeven>[8-9])'
df['result'] = df['foo'].str.replace(pat, callback)
print(df)

收益率

                        foo                                 result
0              the blue dog                             the animal
1  and blue cat wore 7 blue  and animal wore numberBelowSeven blue
2                    hats 9                    hats numberNotSeven
3                  days ago                               days ago

如果您有嵌套的命名组，则可能需要一个更复杂的回调函数，该回调遍历matchobj.groupdict().items()来收集所有相关的组名：

import pandas as pd

def callback(matchobj):
    names = [groupname for groupname, matchstr in matchobj.groupdict().items()
             if matchstr is not None]
    names = sorted(names, key=lambda name: matchobj.span(name))
    result = ' '.join(names)
    return result

df = pd.DataFrame({'foo':["the blue dog", "and blue cat wore 7 blue", "hats 9", 
                          "days ago"]})

pat=r'blue (?P<animal>dog|cat)|(?P<numberItem>(?P<numberBelowSeven>[0-7])|(?P<numberNotSeven>[8-9]))'

# pat=r'(?P<someItem>blue (?P<animal>dog|cat)|(?P<numberBelowSeven>[0-7])|(?P<numberNotSeven>[8-9]))'

df['result'] = df['foo'].str.replace(pat, callback)
print(df)

收益率

                        foo                                            result
0              the blue dog                                        the animal
1  and blue cat wore 7 blue  and animal wore numberItem numberBelowSeven blue
2                    hats 9                    hats numberItem numberNotSeven
3                  days ago                                          days ago

网友

2楼 · 编辑于 2024-06-29 00:01:00

为什么不多次调用re.sub()：

>>> s = re.sub(r"blue (dog|cat)", "animal", s)
>>> s = re.sub(r"\b[0-7]\b", "numberBelowSeven", s)
>>> s = re.sub(r"\b[8-9]\b", "numberNotSeven", s)
>>> s
'the animal and animal wore numberBelowSeven blue hats numberNotSeven days ago'

然后，您可以将其放入“更改列表”中并逐个应用：

^{pr2}$

注意，我添加了单词边界检查（\b）。在

相关问题更多 >

编程相关推荐

热门问题

热门文章

Python正则表达式立即用组名替换组

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >