Python正则表达式Findall Lookahead

2024-06-28 20:34:48 发布

男 | 程序猿一只，喜欢编程写python代码。

我已经创建了一个函数，它可以搜索蛋白质字符串中的开放阅读框。这里是：

def orf_finder(seq,format):
    record = SeqIO.read(seq,format) #Reads in the sequence and tells biopython what format it is.
    string = [] #creates an empty list

    for i in range(3):
        string.append(record.seq[i:]) #creates a list of three lists, each holding a different reading frame.

        protein_string = [] #creates an empty list
        protein_string.append([str(i.translate()) for i in string]) #translates each list in 'string' and combines them into one long list
        regex = re.compile('M''[A-Z]'+r'*') #compiles a regular expression pattern: methionine, followed by any amino acid and ending with a stop codon.
        res = max(regex.findall(str(protein_string)), key=len) #res is a string of the longest translated orf in the sequence.
        print "The longest ORF (translated) is:\n\n",res,"\n"
        print "The first blast result for this protein is:\n"

        blast_records = NCBIXML.parse(NCBIWWW.qblast("blastp", "nr", res)) #blasts the sequence and puts the results into a 'record object'.
        blast_record = blast_records.next()

        counter = 0 #the counter is a method for outputting the first blast record. After it is printed, the counter equals '1' and therefore the loop stops.
        for alignment in blast_record.alignments:
            for hsp in alignment.hsps:
                if counter < 1: #mechanism for stopping loop
                   print 'Sequence:', alignment.title
                   print 'Sength:', alignment.length
                   print 'E value:', hsp.expect
                   print 'Query:',hsp.query[0:]
                   print 'Match:',hsp.match[0:]
                   counter = 1

唯一的问题是，我不认为我的正则表达式re.compile('M''[A-Z]'+r'*')找不到重叠的匹配。我知道一个lookahead子句?=可能会解决我的问题，但我似乎无法在不返回错误的情况下实现它。在

有人知道我怎样才能让它工作吗？在

上面的代码使用biopython读取DNA序列，翻译它，然后搜索蛋白质readin框架；一个以M开头，以'*'结尾的序列。在

Tags： and the in for string is counter res

1条回答

网友

1楼 · 发布于 2024-06-28 20:34:48

re.compile(r"M[A-Z]+\*")

假设搜索的字符串以“M”开头，后跟一个或多个大写字母“A-Z”，以“*”结尾。在

Python正则表达式Findall Lookahead

相关问题更多 >

编程相关推荐

热门问题

热门文章

Python正则表达式Findall Lookahead

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >