正则表达式,用于在一个词之后和一个特殊字符之前提取文本,并排除所有其他数字

2024-09-22 20:33:25 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试写一个正则表达式,对于给定的示例文本

Section 2.1. 1.1.14. Minimum Rent Schedule (subiect to adjustment, if applicable):less than or greater than twelve (12) full calendar months (and such proration or adjustment being based upon the actual number of days in such Lease Year)

所需输出

Minimum Rent Schedule (subiect to adjustment, if applicable)

单词'Section'和upuntill特殊字符':'之间的所有内容。但就像在这里,我不希望它捕捉到任何数字。你知道吗

到目前为止我一直在尝试的是

[Section]+.*[:]

Tags: orto文本示例ifsectionscheduleless
2条回答

这是一种模式。你知道吗

例如:

import re

s = "Section 2.1. 1.1.14. Minimum Rent Schedule (subiect to adjustment, if applicable):less than or greater than twelve (12) full calendar months (and such proration or adjustment being based upon the actual number of days in such Lease Year)"
print(re.match(r"Section[\d.\s]+(.*?):", s).group(1))

输出:

Minimum Rent Schedule (subiect to adjustment, if applicable)

如果有多个元素,请使用re.findall

例如:

print(re.findall(r"Section[\d.\s]+(.*?):", your_text))

您尝试的模式使用character class,它将匹配列出的任何字符1+次。你知道吗

要不匹配任何在Section之后包含数字的字符,可以重复0多次匹配空格,后跟至少包含一个数字的非空格字符。你知道吗

捕获组中不包含数字的内容。你知道吗

Section (?:[^\s\d]*\d\S* )*([^:]+):

解释

  • Section 匹配节和空格
  • (?:非捕获组
    • [^\s\d]*使用negated character class匹配除空白字符和数字0+以外的任何字符
    • \d\S* 然后匹配一个数字,后跟匹配0+乘以一个非空白字符
  • )*关闭组并重复0+次
  • ([^:]+):在组1中捕获匹配1+倍除:之外的任何字符,然后匹配:

Regex demo

例如

import re

regex = r"Section (?:[^\s\d]*\d\S* )*([^:]+):"
s = "Section 2.1. 1.1.14. Minimum Rent Schedule (subiect to adjustment, if applicable):less than or greater than twelve (12) full calendar months (and such proration or adjustment being based upon the actual number of days in such Lease Year)"
print(re.match(regex, s).group(1))

结果

Minimum Rent Schedule (subiect to adjustment, if applicable)

要找到多个,可以使用关于芬德尔地址:

print(re.findall(regex, s))

Demo using re.findall

相关问题 更多 >