忽略括号和re.sub的首次出现

2024-10-01 15:30:50 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在研究一种重新模式,它将替换与字典匹配的项。然而,我编写的代码替换了所有匹配项。有没有办法忽略偏执和第一场比赛?在下面,您可以找到一个示例文本

输入:

s = " SHOO (/ˈshuː/ suhuu) is derivered from Shi Hoo oop our something. SHOO represents title. fu oop our ( FOO ) prefers the name TOP-SHOO.[3] SHOO is one of FOO.Tu REST (tREST) means empty. tREST differs with REST. Doot Ooop Our sour (DOOs) is also means bla. DOOs are friendly."

预期输出:

" SHOO (/ˈshuː/ suhuu) is derivered from Shi Hoo oop our is something. Shi Hoo oop our represents title. fu oop our ( FOO ) prefers the name TOP-SHOO.[3] Shi Hoo oop our is one of fu oop our.Tu REST (tREST) means empty. tu REST differs with REST. Doot Ooop Our sour (DOOs) is also means bla. Doot Ooop Our sour are friendly."
import re

d = {
'tREST':'tu REST',
'FOO': 'fu oop our',
'SHOO': 'Shi Hoo oop our',
'DOOs': 'Doot Ooop Our sour',
'TOP-SHOO' : None
}

for k, v in d.items():
    if v is None:
        d[k] = k

pattern = re.compile(r'\b(' + '|'.join(d.keys()) + r')\b')

result = pattern.sub(lambda x: d[x.group()], ' '.join(s.split()))


Tags: restfooisouroopmeansshifu
1条回答
网友
1楼 · 发布于 2024-10-01 15:30:50

好的,这里有一种方法。其思想是使用更新的regex模块,该模块能够跳过括号中的任何内容(这是通过(*SKIP)(*FAIL)完成的),并实现一个带有计数的元组,而不仅仅是值。最后,我们使用一个替换函数来计算替代项:

import regex as re

# make a tuple out of it
d = {
    'tREST':    ('tu REST', 0),
    'FOO':      ('fu oop our', 0),
    'SHOO':     ('Shi Hoo oop our', 0),
    'DOOs':     ('Doot Ooop Our sour', 0),
    'TOP-SHOO': (None, 0)
}

# clear out Nones
for k, v in d.items():
    if v[0] is None:
        d[k] = (k, 0)

# pattern with r- and f-strings
pattern = re.compile(rf'''
    \([^()]+\)(*SKIP)(*FAIL)
    |
    \b{"|".join(d.keys())}\b
''', re.VERBOSE)

# here comes the magic
def replacer(match):
    key = match.group(0)
    try:
        value, cnt = d[key]
        result = value if cnt else key
        cnt += 1
        d[key] = (value, cnt)
    except KeyError:
        pass
    return result

output = pattern.sub(replacer, s)
print(output)

有点不清楚的是,您想如何处理例如bla bla bla (FOO). bla FOO bla bla.-将第二个FOO替换为第二个,或者将其保留为忽略括号之间的任何内容


可能的优化

您可以保持初始dict的原样,因为我们无论如何都在循环它,然后可以从值中生成一个元组。这可能更容易维护(添加新的取代基,即):

# a dict
d = {
    'tREST':    'tu REST',
    'FOO':      'fu oop our',
    'SHOO':     'Shi Hoo oop our',
    'DOOs':     'Doot Ooop Our sour',
    'TOP-SHOO':  None
}

# clear out Nones
for k, v in d.items():
    if v is None:
        d[k] = (k, 0)
    else:
        d[k] = (v, 0)

相关问题 更多 >

    热门问题