用通配符Python查找复杂的子字符串

2024-06-20 15:12:42 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图在长字符串中定位表达式的位置。表达式的工作原理如下。它由list1的任何元素给出,后跟1到5个单词的通配符(用空格分隔),后跟list2的任何元素。例如:

list1=["a","b"], list2=["c","d"]
text = "bla a tx fg hg gfgf tzt zt blaa  a  bli blubb d  muh meh  muh d"

应该返回“37”,因为在这里可以找到表达式(“bli blubb d”)。我研究了regex通配符,但我很难将其与列表的不同元素以及通配符的可变长度结合起来。你知道吗

谢谢你的建议!你知道吗


Tags: 字符串text定位元素表达式单词原理空格
1条回答
网友
1楼 · 发布于 2024-06-20 15:12:42

可以构造正则表达式:

import re

pref=["a","b"]
suff=["c","d"]

# the pattern is dynamically constructed from your pref and suff lists.
patt = r"(?:\W|^)((?:" + '|'.join(pref) + r")(?: +[^ ]+){1,5} +(?:" + '|'.join(suff) + r"))(?:\W|$)"

text = "bla a tx fg hg gfgf tzt zt blaa  a  bli blubb d  muh meh  muh d"

print(patt)

for k in re.findall(patt,text):
    print(k, "\n", text.index(k))

输出:

(?:\W|^)((?:a|b)(?: +[^ ]+){1,5} +(?:c|d))(?:\W|$)  # pattern
a  bli blubb d                                      # found text
33                                                  # position (your 37 is wrong btw.)

公平警告:这不是一个非常稳健的方法。你知道吗

正则表达式类似于:

Either start of line or non-text character (not captured) followed by
one of your prefs. followed by 1-n spaces, followed by 1-5 non-space things that 
are seperated by 1-n spaces, followed by something from suff followed
by (non captured non-Word-Character or end of line)

有关已组装正则表达式的演示和更完整的描述:请参见https://regex101.com/r/WHZfr9/1

相关问题 更多 >