为什么正则表达式r'[a|(an)|(the)]+'会将'h'和'he'分别检测，而不是作为一个整体检测'the'? - 问答 - Python中文网

为什么正则表达式r'[a|(an)|(the)]+'会将'h'和'he'分别检测，而不是作为一个整体检测'the'?

2024-09-27 21:30:31 发布

您现在位置：Python中文网/ 问答频道 /正文

男 | 程序猿一只，喜欢编程写python代码。

我试图在给定的文本中找到“a”、“an”、“the”。表达式r'[a |（an）|（the）]+'只识别“a”，而不识别“an”和“the”。你知道吗

nltk.re_show(r'[a|(an)|(the)]+', 'sdfkisdfjstdskfhdsklfjkhe an skfjkla')

这给了我输出

sdfkisdfjs{t}dskf{h}dsklfjk{h}{e} {a}{n} skfjkl{a}

我也试过了

nltk.re_show(r'[a|<an>|<the>]+', 'sdfkisdfjstdskfhdsklfjkhe an skfjkla')

我得到一个输出

sdfkisdfjs{t}dskf{h}dsklfjk{he} {an} skfjkl{a}

我不明白为什么会认出“h”和“他”。你知道吗

在这种情况下，什么样的正则表达式才能正确识别给定文本中的“a”、“an”和“the”？你知道吗

Tags： the 文本 re an 表达式 show 情况 he

2条回答

网友

1楼 · 编辑于 2024-09-27 21:30:31

正则表达式

import re

text = 'sdfkisdfjstdskfhdsklfjkhe an skfjkla a dsda the dsathekoo'
array = re.findall(r'the|an|a', text)

print(array)

输出：['an', 'a', 'a', 'a', 'the', 'a', 'the']

网友

2楼 · 编辑于 2024-09-27 21:30:31

方括号和圆括号的含义不同。方括号用于指定“内部的任何一个字符”。你知道吗

另外请注意，如果要匹配“an”，则不希望捕获停止在“a”，这意味着您必须颠倒顺序。你知道吗

你想要什么而不是

[a|(an)|(the)]+

似乎是

(an|a|the)+

或者只是

(an|a|the)

或（可读性较差）

(an?|the)

（是的，一个问题通常有许多正则表达式）

相关问题更多 >

编程相关推荐

热门问题

热门文章