编写正则表达式以匹配1个或2个字母,但不匹配全部3个字母

2024-09-30 12:13:32 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图编写一个正则表达式来匹配包含x、y或z的字符串,但其中只能有1-2个

例如: 有效字符串=xxxx, xxxyyyy, xyxyx, zyzzzyyy, xzzzxx

无效字符串=xyz, xxxyyyyz, zxzyy

我最初是这样写的

regex = re.compile("((x*y*)*)|((x*z*)*)|(y*z*)*)")

我在这里的逻辑是,它首先用xy、xz、yz测试字符串。但不幸的是,这并不奏效。它适用于我的第一个测试字符串xyxyx,但对于我的第二个字符串zyzyzyzy,它不匹配。我是否以错误的方式使用垂直“或”线


Tags: 字符串re逻辑regexxycompilexzxyz
3条回答

我不太清楚你是如何得到你所得到的,但是如果你想匹配一个序列(只有xy)或者(只有xz)或者(只有yz),你可以使用这样的表达式:

^([xy]*|[xz]*|[yz]*)$

字符类(方括号)是指定“这些字符中的任何一个”的方便方法。因此[xy]*表示“仅由x和y字符组成的任意长度的序列”

^$(开始和结束)指示模式应该匹配整个字符串

此外,如果要防止""(空字符串)被匹配,可以将所有*替换为+

您需要断言单词的开始/结束边界\b,然后在三个不同的字符类之间进行转换|

\b([xy]+|[zy]+|[xz]+)\b

Demo

您还可以使用更简单、更快的regex\b[xyz]+\b,并与Python逻辑结合使用:

[w for w in re.findall(r'\b[xyz]+\b', txt) if len(set(w))<=2]

Python Demo

使用前瞻确保包含三个(或更多)不同字符的任何字符串失败:

^(?!.*(.).*(?!\1)(.).*(?!\1|\2).)[xyz]+$

proof

Python:

regex = r"^(?!.*(.).*(?!\1)(.).*(?!\1|\2).)[xyz]+$"

解释

                         EXPLANATION
--------------------------------------------------------------------------------
  ^                        the beginning of the string
--------------------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
    .*                       any character except \n (0 or more times
                             (matching the most amount possible))
--------------------------------------------------------------------------------
    (                        group and capture to \1:
--------------------------------------------------------------------------------
      .                        any character except \n
--------------------------------------------------------------------------------
    )                        end of \1
--------------------------------------------------------------------------------
    .*                       any character except \n (0 or more times
                             (matching the most amount possible))
--------------------------------------------------------------------------------
    (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
      \1                       what was matched by capture \1
--------------------------------------------------------------------------------
    )                        end of look-ahead
--------------------------------------------------------------------------------
    (                        group and capture to \2:
--------------------------------------------------------------------------------
      .                        any character except \n
--------------------------------------------------------------------------------
    )                        end of \2
--------------------------------------------------------------------------------
    .*                       any character except \n (0 or more times
                             (matching the most amount possible))
--------------------------------------------------------------------------------
    (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
      \1                       what was matched by capture \1
--------------------------------------------------------------------------------
     |                        OR
--------------------------------------------------------------------------------
      \2                       what was matched by capture \2
--------------------------------------------------------------------------------
    )                        end of look-ahead
--------------------------------------------------------------------------------
    .                        any character except \n
--------------------------------------------------------------------------------
  )                        end of look-ahead
--------------------------------------------------------------------------------
  [xyz]+                   any character of: 'x', 'y', 'z' (1 or more
                           times (matching the most amount possible))
--------------------------------------------------------------------------------
  $                        before an optional \n, and the end of the
                           string

相关问题 更多 >

    热门问题