python中复杂的regex匹配

for i, line in enumerate(sacc_gff): for match in re.finditer(chromo_val, line): print(line) for match in re.finditer(r"[ATGC]{%d},{%d}\Z" % (int(amino_start), int(amino_end)), line): print(match.group())

2条回答

网友

1楼 · 编辑于 2024-09-28 01:28:54

看起来你在处理fasta的数据，所以我会给出一个答案，但如果不是这样，你仍然可以使用sub_序列选择部分。在

fasta_data = {} # creates an empty dictionary
with open( fasta_file, 'r' ) as fh:
    for line in fh:
        if line[0] == '>':
            seq_id = line.rstrip()[1:] # strip newline character and remove leading '>' character
            fasta_data[seq_id] = ''
        else:
            fasta_data[seq_id] += line.rstrip()

# return substring from chromosome 'chrI' with a first character at amino_start up to but not including amino_end
sequence_string1 = fasta_data['chrI'][amino_start:amino_end]
# return substring from chromosome 'chrII' with a first character at amino_start up to and including amino_end
sequence_string2 = fasta_data['chrII'][amino_start:amino_end+1]

fasta格式：

^{pr2}$

网友

2楼 · 编辑于 2024-09-28 01:28:54

由于您使用的是以下格式的fasta文件：

>Chr1
ATCGACTACAAATTT
>Chr2
ACCTGCCGTAAAAATTTCC

而且是生物信息学专业的，我猜你会经常操作序列，我建议安装名为FAST的perl包。一旦安装了该程序以获取每个序列的2-14个字符，您将执行以下操作：

^{pr2}$

这是最近的publication for FAST和{a2}，其中包含一个用于在命令行上操纵分子序列数据的工具箱。在

相关问题更多 >

编程相关推荐

热门问题

热门文章

python中复杂的regex匹配

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >