Python：如何基于二进制内容的文本文件提取DNA序列？

网友

1楼 · 编辑于 2024-09-29 23:24:34

因为这是更好的使用生物圈

from Bio import SeqIO

mask = ["1"==_.strip() for _ in open("mask.txt")]
seqs = [seq for seq in SeqIO.parse(open("input.fasta"), "fasta")]
seqs_filter = [seq for flag, seq in zip(mask, seqs) if flag]
for seq in seqs_filter:
  print seq.format("fasta")

你会得到：

^{pr2}$

parse fasta:格式fasta可能有几行序列（checkfasta format），最好使用专门的库来读（parser）和写输出

掩码：我读取de mask文件并转换为布尔值[False, True, True]

filter：对每个与他的掩码匹配的序列使用zip函数，下面我使用列表理解进行过滤

网友

2楼 · 编辑于 2024-09-29 23:24:34

当您读取fasta文件时，您可以创建一个类似于掩码的列表：

with open('mask.txt') as mf:
    mask = [ s.strip() == '1' for s in mf.readlines() ]

然后：

^{pr2}$

或者：

from itertools import izip

for b, line in izip(open(mask_file), open(seq_file)):
    if b.strip() == '1':
          *something* line

网友

3楼 · 编辑于 2024-09-29 23:24:34

我认为这可能对你有帮助，我真的认为你应该花点时间学习Python。Python是一种很好的生物信息学语言。在

display = []
with open('test.txt') as f:
    for line in f.readlines():
        display.append(int(line.strip()))

output_DNA = []
with open('XX.fasta') as f:
    index = -1
    for line in f.readlines():
        if line[0] == '>':
            index = index + 1

        if display[index]:
            output_DNA.append(line)

print output_DNA

相关问题更多 >

编程相关推荐

热门问题

热门文章

Python：如何基于二进制内容的文本文件提取DNA序列？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >