用正则表达式替换元素列表

2024-09-30 14:27:48 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个充满副词的文本,它的替换如下:

adverbe1 |replacement1
adverbe2 |replacement2
adverbe3 |replacement3

我想在我的文本中替换副词:

示例:

'Hello adverbe1 this is a test' to be this : 'Hello replacement1 this is a test'

但我已经没有解决方案了,我的代码到目前为止:

adverbes = open("list_adverbes_replacement.txt", encoding="utf-8")
list_adverbes = []
list_replacement = []
for ad in adverbes.readlines():
    if ad != '' and ad.split('|')[0].strip(' ')[-3:] == 'ent':
        list_adverbes.append(ad.split('|')[0].strip(' '))
        list_replacement.append(ad.split('|')[1])
pattern = r"(\s+\b(?:{}))\b".format("|".join(list_adverbes))
data = re.sub(pattern, r"\1", data)

我找不到用适当的替换词替换每个副词的方法

list_adverbes_replacement.txt是我在开始时给出的文本,我正在寻找一个正则表达式解决方案,我只是不知道我缺少了什么


Tags: test文本txthellois解决方案thisad
3条回答

简洁的方法。为替换项构建键/值对字典

然后使用regex're.sub替换它们,方法是对每个单词进行匹配,在字典中查找单词,如果单词不在字典中,则默认为单词本身

import re

d = dict()
with open('list_adverbes_replacement.txt', 'r') as fo:
    for line in fo:
        splt = line.split('|')
        d[splt[0].strip()] = splt[1].strip()

s = 'Hello adverbe1 this is a test, adverbe2'
s = re.sub(r'(\w+)', lambda m: d.get(m.group(), m.group()), s)
print(s)

给定这样的副词:

adverbs =  '''adverbe1 |replacement1
adverbe2 |replacement2
adverbe3 |replacement3'''

用它创建一个字典,其中key是副词,value是替换文本

adverbsDict = {item[0].strip():item[1].strip() for item in map(lambda x: x.split('|'), adverbs.split('\n'))}

现在迭代每个键,只需使用相应的值对给定键的文本调用replace:

text = 'Hello adverbe1 this is a test'
for key in adverbsDict:
    text = text.replace(key, adverbsDict[key])

输出

'Hello replacement1 this is a test'

可以使用副词和替换词初始化字典

dct = {}
with open(r'__t.txt', 'r') as f:
    for line in f:
        items = line.strip().split('|')
        dct[items[0].strip()] = items[1].strip()

dct看起来像{'adverbe1': 'replacement1', 'adverbe2': 'replacement2', 'adverbe3': 'replacement3'}

然后,pip install triegex(或者使用来自Speed up millions of regex replacements in Python 3的这个解决方案)来简化动态正则表达式的构建和使用

import triegex, re

dct = {}
with open(PATH_TO_FILE_WITH_SEARCH_AND_REPLACEMENTS, 'r') as f:
    for line in f:
        items = line.strip().split('|')
        dct[items[0].strip()] = items[1].strip()

test = 'Hello adverbe1 this is a test'
pattern = re.compile(fr'\b{triegex.Triegex(*dct.keys()).to_regex()}')
print( pattern.sub(lambda x: dct[x.group()], test) )
# => Hello replacement1 this is a test

这个演示字典的模式是\b(?:adverbe(?:1\b|2\b|3\b)|~^(?#match nothing)),它将adverbe1adverbe2adverbe3作为整个单词进行匹配

lambda x: dct[x.group()]re.sub的替换参数,它获取相应的替换值

相关问题 更多 >