使用set（）和fastqGeneratorator（）从fastq文件中提取序列子集

from Bio.SeqIO.QualityIO import FastqGeneralIterator infileR1= open('R1.fastq', 'r') infileR2= open('R2.fastq', 'r') output1= open('matchedR1.fastq', 'w') output2= open('matchedR2.fastq', 'w') all_names1 = set() for line in infileR1 : if line[0:11] == '@GWZHISEQ01': read_name = line.split()[0] all_names1.add(read_name) all_names2 = set() for line in infileR2 : if line[0:11] == '@GWZHISEQ01': read_name = line.split()[0] all_names2.add(read_name) shared_names = set() for item in all_names1: if item in all_names2: shared_names.add(item) #printing out the files: for title, seq, qual in FastqGeneralIterator(infileR1): if title in new: output1.write("%s\n%s\n+\n%s\n" % (title, seq, qual)) for title, seq, qual in FastqGeneralIterator(infileR2): if title in shared_names: output2.write("%s\n%s\n+\n%s\n" % (title, seq, qual)) infileR1.close() infileR2.close() output1.close() output2.close()

1条回答

网友

1楼 · 发布于 2024-06-25 23:31:07

在不知道确切错误的情况下（您应该添加一个对它的描述，而不是仅仅说“它失败”），我猜您使用的处理程序已经用尽了。在

使用infileR1= open('R1.fastq', 'r')打开一个处理程序
然后使用for line in infileR1:读取文件以获取标题。在
最后，将相同的处理程序传递给FastqGeneralIterator，但指针位于文件的末尾，因此迭代器已经在文件的末尾，并且不会产生任何结果。在

您应该在最后一个循环之前用infileR1.seek(0)来“倒带”文件，或者按照传递文件名的文档中的建议，更改代码以使用SeqIO包装器：

infileR1.close()

for record in SeqIO.parse("R1.fastq", "fastq"):
    # Do business here

相关问题更多 >

编程相关推荐

热门问题

热门文章