从分隔符之间提取文本的Python迭代正则表达式

网友

1楼 · 编辑于 2024-10-02 20:36:15

使用带有“向后看”和“向前看”的正则表达式：

>>> import re
>>> string = "I want A and I want B and I want C and..."
>>> re.findall(r'(?<=want ).*?(?= and)', string)
['A', 'B', 'C']

工作原理

正则表达式分为三部分：

(?<=want )
仅当前面有字符串want时才匹配
.*?
这匹配任何字符。后面的?使这个匹配不贪婪。这意味着它会找到满足整个正则表达式的最短字符串
(?= and)
仅当字符串中的该点后跟and时，才匹配

另外，请注意string是标准模块的名称，最好不要选择可能与标准模块冲突的变量名

备选方案

正如AvinashRaj所指出的，我们也可以使用一个捕获组来完成这个任务，而不是使用“向后看，向前看”的组合：

>>> re.findall(r'\bwant\s+(.*?)\s+and\b', string)
['A', 'B', 'C']

网友

2楼 · 编辑于 2024-10-02 20:36:15

下面是一个脚本，重新定义芬迪：

from __future__ import print_function
import re


def Findy(start, end, anystring):
    pattern = '{}(.*?){}'.format(start, end)
    return re.findall(pattern, anystring)

string = 'I want A and I want B and I want C and...'
print(Findy('want', 'and', string))

输出：>>> [' A ', ' B ', ' C ']

模式如下：

开始匹配字符开始
（*？）。捕获除换行符以外的任何字符，*零次或多次？尽可能地，（）是一个捕获组
结束匹配字符结束

UDPATE：如果不需要空格字符，可以使用pattern = '{}\s*(\S*?)\s*{}'.format(start, end)

\s matches any white space character
\S matches any non-white space character

输出：>>> ['A', 'B', 'C']

网友

3楼 · 编辑于 2024-10-02 20:36:15

不知道此代码是否满足您的要求：

def findy(start, end, anystr):
    res = []
    tmp = anystr.split(start)[1:]
    for e in tmp:
        res.append(e.split(end)[0].strip())
    return res

工作原理

备选方案

相关问题更多 >

编程相关推荐

热门问题

热门文章

从分隔符之间提取文本的Python迭代正则表达式

工作原理

备选方案

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >