Python正则表达式modu中的简单大小写折叠与完整大小写折叠

2024-09-29 23:25:23 发布

您现在位置:Python中文网/ 问答频道 /正文

这是我要问的模块:https://pypi.org/project/regex/,它是马修·巴内特的regex。在

在项目描述页面中,V0和V1之间的行为差异如下所示(请注意粗体部分):

Old vs new behaviour

In order to be compatible with the re module, this module has 2 behaviours:

  • Version 0 behaviour (old behaviour, compatible with the re module):

    Please note that the re module’s behaviour may change over time, and I’ll endeavour to match that behaviour in version 0.

    • Indicated by the VERSION0 or V0 flag, or (?V0) in the pattern.
    • Case-insensitive matches in Unicode use simple case-folding by default.
  • Version 1 behaviour (new behaviour, possibly different from the re module):

    • Indicated by the VERSION1 or V1 flag, or (?V1) in the pattern.
    • Case-insensitive matches in Unicode use full case-folding by default.

If no version is specified, the regex module will default to regex.DEFAULT_VERSION.

我自己也试过几个例子,但没弄明白它有什么作用:

Python 3.6.7 (default, Oct 22 2018, 11:32:17)
[GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import regex
>>> r = regex.compile("(?V0i)и")
>>> r
regex.Regex('(?V0i)и', flags=regex.I | regex.V0)
>>> r.search("И")
<regex.Match object; span=(0, 1), match='И'>
>>> regex.search("(?V0i)é", "É")
<regex.Match object; span=(0, 1), match='É'>
>>> regex.search("(?V0i)é", "E")
>>> regex.search("(?V1i)é", "E")

简单的箱子折叠和完整的箱子折叠有什么区别?或者你能提供一个例子来说明一个(不区分大小写的)正则表达式与V1中的内容匹配,而不是V0中的内容?在


Tags: orthetoinredefaultsearchby
1条回答
网友
1楼 · 发布于 2024-09-29 23:25:23

它跟在Unicode case folding table后面。节选:

# The entries in this file are in the following machine-readable format:
#
# <code>; <status>; <mapping>; # <name>
#
# The status field is:
# C: common case folding, common mappings shared by both simple and full mappings.
# F: full case folding, mappings that cause strings to grow in length. Multiple characters are separated by spaces.
# S: simple case folding, mappings to single characters where different from F.

[...]

# Usage:
#  A. To do a simple case folding, use the mappings with status C + S.
#  B. To do a full case folding, use the mappings with status C + F.

只有少数特殊字符的折叠方式不同,例如小型大写拉丁字母s:

^{pr2}$

相关问题 更多 >

    热门问题