正则表达式将两个由句号链接的单词分隔开

a = [' boys.aussi', 'interpretation.une', 'amour.hugh', 'amy.com', 'frenchemabassy.net'] alphabet = "([a-z][...])" alphabets = "([A-Za-z])" prefixes = "(Mr|St|Mrs|Ms|Dr)[.]" suffixes = "(Inc|Ltd|Jr|Sr|Co)[.]" starters = "(M|Mr|Mme|Sr|Dr)" acronyms = "([A-Z][.][A-Z][.](?:[A-Z][.])?)" websites = "[.](com|net|org|io|gov)" digits = "([0-9])" # sÃ©pare les phrases def normalize(text): # do_lower=False): text = re.sub(alphabets + "[.]" + alphabets,) return text normalize(a)

1条回答

网友

1楼 · 发布于 2024-06-13 12:11:07

在正则表达式中使用负前瞻断言，以便将“.”替换为“.”仅当其后面没有任何特殊的Internet顶级域名时：

import re

def normalize(text):
    return re.sub(r'\.(?!(com|net|org|io|gov))', '. ', text)

a = [' boys.aussi', 'interpretation.une', 'amour.hugh', 'amy.com', 'frenchemabassy.net']
a = [normalize(s) for s in a]
print(a)

印刷品：

[' boys. aussi', 'interpretation. une', 'amour. hugh', 'amy.com', 'frenchemabassy.net']

注意，我只是在使用您websites变量中的TLD列表；还有很多你想添加的

相关问题更多 >

编程相关推荐

热门问题

热门文章