正则表达式提取列表中以大写字母开头的子字符串，并使用法语特殊符号

1条回答

网友

1楼 · 发布于 2024-09-30 04:41:55

因为这与特定的Unicode字符处理有关，所以我建议使用PyPi regex module（使用pip install regex安装），然后可以使用

import regex
text = "Français Langues bantoues Presse écrite Gabon Particularité linguistique"
matches = regex.split(r'(?!\A)\b(?=\p{Lu})', text)
print( list(map(lambda x: x.strip(), matches)) )
# => ['Français', 'Langues bantoues', 'Presse écrite', 'Gabon', 'Particularité linguistique']

见online Python demo和regex demo详细信息：

(?!\A)-字符串开头以外的位置
\b-单词边界
(?=\p{Lu})-一个正向前瞻，要求下一个字符为Unicode大写字母

注意map(lambda x: x.strip(), matches)用于从结果块中去除多余的空白

你也可以用re来做这件事：

import re, sys
text = "Français Langues bantoues Presse écrite Gabon Particularité linguistique"
pLu = '[{}]'.format("".join([chr(i) for i in range(sys.maxunicode) if chr(i).isupper()]))
matches = re.split(fr'(?!\A)\b(?={pLu})', text)
print( list(map(lambda x: x.strip(), matches)) )
# => ['Français', 'Langues bantoues', 'Presse écrite', 'Gabon', 'Particularité linguistique']

请参见this Python demo，但请记住，支持的Unicode大写字母数量因版本而异，使用PyPi regex模块使其更加一致

相关问题更多 >

编程相关推荐

热门问题

热门文章

正则表达式提取列表中以大写字母开头的子字符串，并使用法语特殊符号

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >