python在同一个lin上搜索不同的字符串

2024-09-25 14:22:53 发布

您现在位置:Python中文网/ 问答频道 /正文

我想优化以下代码:

if re.search(str(stringA), line) and re.search(str(stringB), line):
    .....
    .....

我试过:

stringAB = stringA + '.*' + stringB
if re.search(str(stringAB), line):
    .....
    .....

但我得到的结果并不可靠。我在用“检索因为这似乎是我搜索stringA和stringB中指定的模式的确切正则表达式的唯一方法。你知道吗

此代码背后的逻辑是按照下面的egrep命令示例建模的:

stringA=Success
stringB=mysqlDB01

egrep "${stringA}" /var/app/mydata | egrep "${stringB}"

如果没有更好的方法检索,请告诉我。你知道吗


Tags: and方法代码命令researchifline
1条回答
网友
1楼 · 发布于 2024-09-25 14:22:53

一种方法是创建一个匹配任意一个单词的模式(使用\b,这样我们就只匹配完整的单词),使用re.findall检查字符串中的所有匹配项,然后使用set equality确保两个单词都匹配。你知道吗

import re

stringA = "spam"
stringB = "egg"

words = {stringA, stringB}

# Make a pattern that matches either word
pat = re.compile(r"\b{}\b|\b{}\b".format(stringA, stringB))

data = [
    "this string has spam in it",
    "this string has egg in it",
    "this string has egg in it and another egg too",
    "this string has both egg and spam in it",
    "the word spams shouldn't match",
    "and eggs shouldn't match, either",
]

for s in data:
    found = pat.findall(s)
    print(repr(s), found, set(found) == words)   

输出

'this string has spam in it' ['spam'] False
'this string has egg in it' ['egg'] False
'this string has egg in it and another egg too' ['egg', 'egg'] False
'this string has both egg and spam in it' ['egg', 'spam'] True
"the word spams shouldn't match" [] False
"and eggs shouldn't match, either" [] False

set(found) == words更有效的方法是使用words.issubset(found),因为它跳过了found的显式转换。你知道吗


正如Jon Clements在一篇评论中提到的,我们可以简化和概括模式来处理任意数量的单词,我们应该使用re.escape,以防任何单词包含regex元字符。你知道吗

pat = re.compile(r"\b({})\b".format("|".join(re.escape(word) for word in words)))

谢谢,乔恩!你知道吗


下面是一个按指定顺序匹配单词的版本。如果找到匹配项,则打印匹配的子字符串,否则不打印任何子字符串。你知道吗

import re

stringA = "spam"
stringB = "egg"
words = [stringA, stringB]

# Make a pattern that matches all the words, in order
pat = r"\b.*?\b".join([re.escape(word) for word in words])
pat = re.compile(r"\b" + pat + r"\b")

data = [
    "this string has spam and also egg, in the proper order",
    "this string has spam in it",
    "this string has spamegg in it",
    "this string has egg in it",
    "this string has egg in it and another egg too",
    "this string has both egg and spam in it",
    "the word spams shouldn't match",
    "and eggs shouldn't match, either",
]

for s in data:
    found = pat.search(s)
    if found:
        found = found.group()
    print('{!r}: {!r}'.format(s, found))

输出

'this string has spam and also egg, in the proper order': 'spam and also egg'
'this string has spam in it': None
'this string has spamegg in it': None
'this string has egg in it': None
'this string has egg in it and another egg too': None
'this string has both egg and spam in it': None
"the word spams shouldn't match": None
"and eggs shouldn't match, either": None

相关问题 更多 >