python在同一个lin上搜索不同的字符串

1条回答

网友

1楼 · 发布于 2024-09-25 14:22:53

一种方法是创建一个匹配任意一个单词的模式（使用\b，这样我们就只匹配完整的单词），使用re.findall检查字符串中的所有匹配项，然后使用set equality确保两个单词都匹配。你知道吗

import re

stringA = "spam"
stringB = "egg"

words = {stringA, stringB}

# Make a pattern that matches either word
pat = re.compile(r"\b{}\b|\b{}\b".format(stringA, stringB))

data = [
    "this string has spam in it",
    "this string has egg in it",
    "this string has egg in it and another egg too",
    "this string has both egg and spam in it",
    "the word spams shouldn't match",
    "and eggs shouldn't match, either",
]

for s in data:
    found = pat.findall(s)
    print(repr(s), found, set(found) == words)

输出

'this string has spam in it' ['spam'] False
'this string has egg in it' ['egg'] False
'this string has egg in it and another egg too' ['egg', 'egg'] False
'this string has both egg and spam in it' ['egg', 'spam'] True
"the word spams shouldn't match" [] False
"and eggs shouldn't match, either" [] False

做set(found) == words更有效的方法是使用words.issubset(found)，因为它跳过了found的显式转换。你知道吗

正如Jon Clements在一篇评论中提到的，我们可以简化和概括模式来处理任意数量的单词，我们应该使用re.escape，以防任何单词包含regex元字符。你知道吗

pat = re.compile(r"\b({})\b".format("|".join(re.escape(word) for word in words)))

谢谢，乔恩！你知道吗

下面是一个按指定顺序匹配单词的版本。如果找到匹配项，则打印匹配的子字符串，否则不打印任何子字符串。你知道吗

import re

stringA = "spam"
stringB = "egg"
words = [stringA, stringB]

# Make a pattern that matches all the words, in order
pat = r"\b.*?\b".join([re.escape(word) for word in words])
pat = re.compile(r"\b" + pat + r"\b")

data = [
    "this string has spam and also egg, in the proper order",
    "this string has spam in it",
    "this string has spamegg in it",
    "this string has egg in it",
    "this string has egg in it and another egg too",
    "this string has both egg and spam in it",
    "the word spams shouldn't match",
    "and eggs shouldn't match, either",
]

for s in data:
    found = pat.search(s)
    if found:
        found = found.group()
    print('{!r}: {!r}'.format(s, found))

输出

'this string has spam and also egg, in the proper order': 'spam and also egg'
'this string has spam in it': None
'this string has spamegg in it': None
'this string has egg in it': None
'this string has egg in it and another egg too': None
'this string has both egg and spam in it': None
"the word spams shouldn't match": None
"and eggs shouldn't match, either": None

相关问题更多 >

编程相关推荐

热门问题

热门文章

python在同一个lin上搜索不同的字符串

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >