使用Biopython通过BLAST获取未知序列的详细信息

def find_organism(file): """ Receives a fasta file with a single seq, and uses BLAST to find from which organism it was taken. """ # get seq from fasta file seqRecord = SeqIO.read(file,"fasta") # run BLAST blastResult = NCBIWWW.qblast("blastn", "nt", seqRecord.seq) # get first hit blastRecord = NCBIXML.read(blastResult) firstHit = blastRecord.alignments[0] # get hit's gi number title = firstHit.title gi = title.split("|")[1] # search NCBI for the gi number ncbiResult = Entrez.efetch(db="nucleotide", id=gi, rettype="gb", retmode="text") ncbiResultSeqRec = SeqIO.read(ncbiResult,"gb") # get organism annotatDict = ncbiResultSeqRec.annotations return(annotatDict['organism'])

1条回答

网友

1楼 · 发布于 2024-10-01 17:40:06

你可以通过以下方法获得有机体：

[...]
blastResult = NCBIWWW.qblast("blastn", "nt", seqRecord.seq)
blastRecord = NCBIXML.read(blastResult)

first_organism = blastRecord.descriptions[0]

这至少可以保存efetch查询。无论如何，“blastn”可能需要太长时间，如果你打算大规模查询NCBI，你将被禁止。在

相关问题更多 >

编程相关推荐

热门问题

热门文章