在Python上查找字符串模式

2024-09-29 02:15:42 发布

您现在位置:Python中文网/ 问答频道 /正文

我的函数用于接收一个大字符串,遍历它,并找到模式“AGATC”连续重复的最大次数。不管我给这个函数输入什么,我的返回值总是1

def agatc(s):
    maxrep = 0
    temp = 0
    for i in range(len(s) - 4):
        if s[i] == "A" and s[i + 1] == "G" and s[i + 2] == "A" and s[i + 3] == "T" and s[i + 4] == "C":
            temp += 1
            print(i)
            i += 3
        else:
            if temp > maxrep:
                maxrep = temp
            temp = 0
    return maxrep

还尝试用(0, len(s) - 4, 1)初始化for循环,得到了相同的返回

我认为问题可能在于向i变量添加3(显然不是),所以我添加了print(i)以查看发生了什么。我得到以下信息:

45
1938
2049
2195
2952
2957
2962
2967
2972
2977
2982
2987
2992
2997
3002
3007
3012
3017
3022
3689
4754

Tags: and函数字符串forlenifdef模式
3条回答

通过这种方式,您可以找到重叠匹配的数量:

def agatc(s):
    temp = 0
    for i in range(len(s) - len("AGATC") + 1):
        if s[i:i+len("AGATC")] == "AGATC":
            temp += 1
    return temp

如果要查找不重叠的匹配项:

def agatc(s):
    temp = 0
    i = 0
    while i < len(s) - len("AGATC") + 1:
        if s[i:i+len("AGATC")] == "AGATC":
            temp += 1
            i += len("AGATC")
        else:
            i += 1
    return temp

使用模块re

import re

s = 'FGHAGATCATCFJSFAGATCAGATCFHGH'
match = re.finditer('(?P<name>AGATC)+', s)
max_len = 0
result = tuple()
for m in match:
    l = m.end() - m.start()
    if l > max_len:
        max_len = l
        result = (m.start(), m.end())

print(result)

我个人会使用正则表达式。但是如果您不想这样做,可以使用str.find()方法。以下是我的解决方案:

def agatc(s):
    cnt = 0
    findstr='aga'                             # pattern you are looking for
    for i in range(len(s)):
        index = s.find(findstr)
        if index != -1:
            cnt+=1
            s = s[index+1:]                   # overlapping matches
            # s = s[index+len(findstr):]      # non-overlapping matches only
            print(index, s)                   # just to see what happens
    return cnt

相关问题 更多 >