如何创建一个正则表达式模式来从一系列不同结构的字符串中提取一个字符？

test_strings = [ "Holics u 5/a", "Holics U 5/a", "Holics u5/a", "Huolics u 5/a", "Holics u. 5/a", "Holuics u5", "Holics and other stuff u more stuff after 5", "Houlics utca 5" ] # two regex patterns I have considered print("First regex pattern ------------------------------------") pattern = r"[^\w+][uU]" replacement_text = " utca " for item in test_strings: print(re.sub(pattern,replacement_text,item)) print("\nSecond regex pattern ------------------------------------") pattern = r"[^\w+][uU][^tca]" replacement_text = " utca " for item in test_strings: print(re.sub(pattern,replacement_text,item))

Holics utca 5/a Holics utca 5/a Holics utca 5/a Huolics utca 5/a Holics utca . 5/a Holuics utca 5 Holics and other stuff utca more stuff after 5 Houlics utca tca 5 # <-------------------------------- issue

Holics utca 5/a Holics utca 5/a Holics utca /a # <----------------------------------- issue Huolics utca 5/a Holics utca 5/a Holuics utca <-------------------------------------- issue Holics and other stuff utca more stuff after 5 Houlics utca 5

1条回答

网友

1楼 · 发布于 2024-10-02 10:24:13

你可以用

re.sub(r'\b[uU](?=\b|\d)\.?\s*', 'utca ', s)

细节

\b-词边界
[uU]-u或U
(?=\b|\d)-当前位置右侧必须有一个单词边界或一个数字
\.?-可选点
\s*-0+空格。你知道吗

或者，您可以使用

re.sub(r'\b[uU](?=\b|(?![^\W\d_]))\.?\s*', 'utca ', s)

参见regex demo和another regex demo。你知道吗

这里，如果下一个字符是字母，则(?![^\W\d_])失败，而不是数字要求。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章