计算连续的字母和连字符，并将它们编码为游程长度

网友

1楼 · 编辑于 2024-09-27 02:20:08

经典方法：

seq="ATGC  CGCTA  -G -"

def MD(c):
    if c.isalpha():return "M"
    else : return "D"

count=1
string=""
for i in range(len(seq)-1):
    if MD(seq[i])==MD(seq[i+1]): count+=1
    else: 
        string=string+str(count)+MD(seq[i])
        count=1
string=string+str(count)+MD(seq[-1])
print string

网友

2楼 · 编辑于 2024-09-27 02:20:08

这个问题是itertools.groupby的理想问题

实施

from itertools import groupby
''.join('{}{}'.format(len(list(g)), 'DM'[k]) 
        for k, g in groupby(seq, key = str.isalpha))

输出 “4M4D5M5D1M3D”

说明

值得注意的是，关键功能在这里至关重要。根据是否是字母表对序列进行分组。一旦完成，就应该直接计算每个组的大小，并从关键元素中找出组的类型。在

对代码的一些解释

'DM'[k]：这只是表示"M" if k == True else "D"的一种巧妙方式
len(list(g))：确定每个组的大小。或者，它可以写成sum(1 for e in g)
'{}{}'.format：字符串格式，用于创建连续频率和类型的连接
''.join(：将列表元素作为字符串序列联接。在

网友

3楼 · 编辑于 2024-09-27 02:20:08

import re
seq='ATGC  CGCTA  -G -'

output = ''
for section in re.split('(-*)', seq):
    if section.isalpha():
        output += str(len(section)) + 'M'
    elif section !='':
        output += str(len(section)) + 'D'
print output

相关问题更多 >

编程相关推荐

热门问题

热门文章

计算连续的字母和连字符，并将它们编码为游程长度

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >