Python的序列匹配器提供不完全匹配

from difflib import SequenceMatcher str1 = "ABCDPQRUVWXYZ" str2 = "PQRABCDUVWXYZ" matchAll = SequenceMatcher(None, str1, str2, False).get_matching_blocks() for i in range(0, len(matchAll)): print(str1[matchAll[i].a: matchAll[i].a + matchAll[i].size])

3条回答

网友

1楼 · 编辑于 2024-09-26 17:50:05

这可能是您想要的-但不会找到重叠的匹配项（修改为在子字符串的s1和s2中包括字符串位置）：

str1 = "ABCDEPQRUVWXYZ" # added extra non-matching character
str2 = "PQRABCDUVWXYZ"

def find_subs(s1, s2):
    subs = []
    loc = 0
    while s1:
        s1_copy = s1
        while s1_copy:
            while s1_copy and s1_copy not in s2:
                s1_copy = s1_copy[:-1]
            if s1_copy:
                subs.append((loc, s2.index(s1_copy), s1_copy))
                loc += len(s1_copy)
                s1 = s1[len(s1_copy):]
            else:
                s1 = s1[1:]
                loc += 1
            s1_copy = s1                
    return subs

print(find_subs(str1, str2))

印刷品：

^{pr2}$

网友

2楼 · 编辑于 2024-09-26 17:50:05

感谢所有回复我帖子的程序员。在

作为一种解决方案，我进行了试验，发现了另一种解决方案

SequenceMatcher's find_longest_match()

方法。这基本上包括重复查找两个字符串之间的最长匹配，然后每次都用垃圾字符替换匹配的最长字符串。这也很管用。在

网友

3楼 · 编辑于 2024-09-26 17:50:05

docs声明：

get_matching_blocks()
Return list of triples describing matching subsequences. Each triple is of the form (i, j, n), and means that a[i:i+n] == b[j:j+n]. The triples are monotonically increasing in i and j.

如果函数在您的示例中返回"PQR"，则j不会单调增加，因为它将从"ABCD"匹配的"A"索引返回到"PQR"匹配的"P"索引。在

相关问题更多 >

编程相关推荐

热门问题

热门文章