在字符串中只保留字母字符（多语言）

1条回答

网友

1楼 · 发布于 2024-10-01 09:28:46

[^\W\d\u]

对于Python3或Python2中的re.UNICODE标志，可以使用[^\W\d_]。在

\W : If UNICODE is set, this will match anything other than [0-9_] plus characters classified as not alphanumeric in the Unicode character properties database.

所以[^\W\d_]是不是字母数字、数字或下划线的任何东西。换句话说，它是任何字母字符。：）

>>> import re
>>> re.findall("[^\W\d_]", "jüste Ä tösté 1234 ßÜ א д", re.UNICODE)
['j', 'ü', 's', 't', 'e', 'Ä', 't', 'ö', 's', 't', 'é', 'ß', 'Ü', 'א', 'д']

先删除数字，然后查找“\w”

为了避免这种复杂的逻辑，您还可以先删除数字和下划线，然后查找字母数字字符：

^{pr2}$

正则表达式模块

似乎^{}模块可能会有所帮助，因为它理解\p{L}或{}。在

This regex implementation is backwards-compatible with the standard ‘re’ module, but offers additional functionality.

>>> import regex as re
>>> re.findall("\p{L}", "jüste Ä tösté 1234 ßÜ א д", re.UNICODE)
['j', 'ü', 's', 't', 'e', 'Ä', 't', 'ö', 's', 't', 'é', 'ß', 'Ü', 'א', 'д']

（用Pythonpython3.6测试）

[^\W\d\u]

先删除数字，然后查找“\w”

正则表达式模块

相关问题更多 >

编程相关推荐

热门问题

热门文章

在字符串中只保留字母字符（多语言）

[^\W\d\u]

先删除数字，然后查找“\w”

正则表达式模块

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >