删除所有出现的字母并替换为错误数

2024-10-06 11:24:59 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个代码,理论上应该输入一个有错误的DNA,然后删除所有的错误(在我的例子中是N),并计算在那个位置删除了多少个N。你知道吗

我的代码:

class dnaString (str):
    def __new__(self,s):
        #the inputted DNA sequence is converted as a string in all upper cases
        return str.__new__(self,s.upper())      
    def getN (self):
        #returns the count of value of N in the sequence
        return self.count("N")
    def remove(self):

        print(self.replace("N", "{}".format(coolString.getN())))
#asks the user to input a DNA sequence
dna = input("Enter a dna sequence: ")
#takes the inputted DNA sequence, ???
coolString = dnaString(dna)
coolString.remove()

当我输入AaNNNNNNGTC时,我应该得到AA{6}GTC作为答案,但当我运行代码时,它会打印出AA666666GTC,因为我最终用计数替换了每个错误。我怎么才能只输入一次计数呢?你知道吗


Tags: the代码inselfnewdef错误upper
3条回答

documentation可以预料到:

Return a copy of string s with all occurrences of substring old replaced by new.

一种解决方案是使用正则表达式。^{}可以接受生成替换字符串的可调用字符串:

import re

def replace_with_count(x):
    return "{%d}" % len(x.group())

test = 'AaNNNNNNGTNNC'

print re.sub('N+', replace_with_count, test)

不是最干净的解决方案,但确实有效

from itertools import accumulate
s = "AaNNNNNNGTC"
for i in reversed(list(enumerate(accumulate('N'*100, add)))):
    s=s.replace(i[1], '{'+str(i[0] + 1)+'}')
s = 'Aa{6}GTC'

如果要在没有外部库的情况下完成任务,可以使用以下方法:

def fix_dna(dna_str):
    fixed_str = ''
    n_count = 0
    n_found = False
    for i in range(len(dna_str)):
         if dna_str[i].upper() == 'N':
             if not n_found:
                 n_found = True
             n_count += 1
         elif n_found:
             fixed_str += '{' + str(n_count) + '}' + dna_str[i]
             n_found = False
             n_count = 0
         elif not n_found:
             fixed_str += dna_str[i]
    return fixed_str

相关问题 更多 >