用正则表达式替换元素列表

adverbes = open("list_adverbes_replacement.txt", encoding="utf-8") list_adverbes = [] list_replacement = [] for ad in adverbes.readlines(): if ad != '' and ad.split('|')[0].strip(' ')[-3:] == 'ent': list_adverbes.append(ad.split('|')[0].strip(' ')) list_replacement.append(ad.split('|')[1]) pattern = r"(\s+\b(?:{}))\b".format("|".join(list_adverbes)) data = re.sub(pattern, r"\1", data)

3条回答

网友

1楼 · 编辑于 2024-09-30 14:27:48

简洁的方法。为替换项构建键/值对字典

然后使用regex're.sub替换它们，方法是对每个单词进行匹配，在字典中查找单词，如果单词不在字典中，则默认为单词本身

import re

d = dict()
with open('list_adverbes_replacement.txt', 'r') as fo:
    for line in fo:
        splt = line.split('|')
        d[splt[0].strip()] = splt[1].strip()

s = 'Hello adverbe1 this is a test, adverbe2'
s = re.sub(r'(\w+)', lambda m: d.get(m.group(), m.group()), s)
print(s)

网友

2楼 · 编辑于 2024-09-30 14:27:48

给定这样的副词：

adverbs =  '''adverbe1 |replacement1
adverbe2 |replacement2
adverbe3 |replacement3'''

用它创建一个字典，其中key是副词，value是替换文本

adverbsDict = {item[0].strip():item[1].strip() for item in map(lambda x: x.split('|'), adverbs.split('\n'))}

现在迭代每个键，只需使用相应的值对给定键的文本调用replace：

text = 'Hello adverbe1 this is a test'
for key in adverbsDict:
    text = text.replace(key, adverbsDict[key])

输出：

'Hello replacement1 this is a test'

网友

3楼 · 编辑于 2024-09-30 14:27:48

可以使用副词和替换词初始化字典

dct = {}
with open(r'__t.txt', 'r') as f:
    for line in f:
        items = line.strip().split('|')
        dct[items[0].strip()] = items[1].strip()

dct看起来像{'adverbe1': 'replacement1', 'adverbe2': 'replacement2', 'adverbe3': 'replacement3'}

然后，pip install triegex（或者使用来自Speed up millions of regex replacements in Python 3的这个解决方案）来简化动态正则表达式的构建和使用

import triegex, re

dct = {}
with open(PATH_TO_FILE_WITH_SEARCH_AND_REPLACEMENTS, 'r') as f:
    for line in f:
        items = line.strip().split('|')
        dct[items[0].strip()] = items[1].strip()

test = 'Hello adverbe1 this is a test'
pattern = re.compile(fr'\b{triegex.Triegex(*dct.keys()).to_regex()}')
print( pattern.sub(lambda x: dct[x.group()], test) )
# => Hello replacement1 this is a test

这个演示字典的模式是\b(?:adverbe(?:1\b|2\b|3\b)|~^(?#match nothing))，它将adverbe1、adverbe2、adverbe3作为整个单词进行匹配

lambda x: dct[x.group()]是re.sub的替换参数，它获取相应的替换值

相关问题更多 >

编程相关推荐

热门问题

热门文章