替换忽略特定单词的所有连续重复字母

import itertools sentence = 'hello, join this meeting heere using thiis lllink' keepWord = ['hello','meeting'] new_sentence = '' for word in sentence.split(): if word not in keepWord: new_word = ''.join(c[0] for c in itertools.groupby(word)) new_sentence = sentence +" " + new_word else: new_sentence = sentence +" " + word

2条回答

网友

1楼 · 编辑于 2024-09-29 23:32:08

虽然不是特别紧凑，但这里有一个使用regexp的相当简单的示例：函数subst将用单个字符替换重复的字符，然后使用re.sub为找到的每个单词调用该函数

这里假设，因为您的示例keepWord列表（第一次提到的地方）的标题大小写为Hello，而文本的小写字母为hello，所以您希望对列表执行不区分大小写的比较。因此，无论你的句子包含Hello还是hello，它都同样有效

import re

sentence = 'hello, join this meeting heere using thiis lllink'
keepWord = ['Hello','meeting']

keepWord_s = set(word.lower() for word in keepWord)

def subst(match):
    word = match.group(0)
    return word if word.lower() in keepWord_s else re.sub(r'(.)\1+', r'\1', word)

print(re.sub(r'\b.+?\b', subst, sentence))

给出：

hello, join this meeting here using this link

网友

2楼 · 编辑于 2024-09-29 23:32:08

您可以匹配keepWord列表中的整个单词，并且在其他上下文中仅替换两个或多个相同字母的序列：

import re
sentence = 'hello, join this meeting heere using thiis lllink'
keepWord = ['hello','meeting']
new_sentence = re.sub(fr"\b(?:{'|'.join(keepWord)})\b|([^\W\d_])\1+", lambda x: x.group(1) or x.group(), sentence)
print(new_sentence)
# => hello, join this meeting here using this link

见Python demo

正则表达式看起来像

\b(?:hello|meeting)\b|([^\W\d_])\1+

见regex demo。如果组1匹配，则返回其值，否则，将返回完全匹配（要保留的单词）

图案细节

\b(?:hello|meeting)\b-hello或meeting用单词边界括起来
|-或
([^\W\d_])-第1组：任何Unicode字母
\1+-对组1值的一个或多个反向引用

相关问题更多 >

编程相关推荐

热门问题

热门文章