我需要帮助,因为我被困住了。 我有一个带有序列ID的txt文件,它 看起来像这样-->
tr|K9RTD0|K9RTD0_SYNP3
tr|K9RSV3|K9RSV3_SYNP3
tr|K9RRE8|K9RRE8_SYNP3
tr|K9RMU9|K9RMU9_SYNP3
然后我有一个典型的fasta文件
>sp|P00115|CYC6_SYNP3 Cytochrome c6 OS=Synechococcus sp. (strain ATCC 27167 / PCC 6312) OX=195253 GN=petJ PE=1 SV=2
MKTLLTILALTLVTLTTWLSTPAFAADIADGAKVFSANCAACHMGGGNVVMANKTLKKEA
LEQFGMNSADAIMYQVQNGKNAMPAFGGRLSEAQIENVAAYVLDQSSKNWAG
>tr|K9RTH7|K9RTH7_SYNP3 N-acyl-D-glucosamine 2-epimerase OS=Synechococcus sp. (strain ATCC 27167 / PCC 6312) OX=195253 GN=Syn6312_2130 PE=4 SV=1
MAPQINFPFSDLIAGYVTSYDTETDIFGLKTSDGREFPVKLSPMAYAKVIQNFDEGYPDA
TSTMRAWLTPGRFLFVYGVFYPDTDVFDAKQVVFAGKKEDDYVFEKQDWWIQQINALGKF
YVKAQFGQEEIDYRNYRTDLSVSGERSSVKFRQETDTISRLVYGFATAFMMTGDEVFLEA
AEKGTEYLRDHMRFVDRDEDIIYWYHGIDVQGEKELKIFASEFGDDYDAIPAYEQIYALA
GPIQTYRCTGDPRILSDAEQTIKLFDKFFLDQSEYGGYFSHIDPLMLDPRSDSLGRNKAR
KNWNSVGDHAPAYLINLWLATGEQKYADMLEYTFDTIEKYFPDYENSPFVQERFYEDWSH
DTTWGWQQNRAVVGHNLKIAWNLMRMQSLKPKEQYVGLAQKIADLMPSVGSDQQRGGWSD
TVERLLTNNSKFHQFVWHDRKAWWQQEQAILAYLILGGILEHDDYHRLGREAAAFYNAWF
LDLEDGGVYFNVLANGISYLARGNERAKGSHSMSGYHSFELCYLAAVYTNFLITKHPMDF
YFKPLPNGFPDRILRVSPDILPPGSILLESVEIDGKAYTDFDSQALTVKLPETKERVKVK
VRLAPKS
>tr|K9RXQ9|K9RXQ9_SYNP3 Uncharacterized protein OS=Synechococcus sp. (strain ATCC 27167 / PCC 6312) OX=195253 GN=Syn6312_3008 PE=4 SV=1
MKVEILKKRLNKECPMTTTRMPEDVIQELKQIASLLVFWGYQPLIGADIGQGLRTDLEQL
EDDKVSALVASLKRHRVSDEVLQTALMETTIN
我需要比较这两个文件,找到基于id的序列描述并打印它。 我的代码:
from Bio import SeqIO
from Bio.SeqRecord import SeqRecord
import sys
p = "proteome.fasta"
file = "reference.txt"
out = "jopik.txt"
with open(out, "w") as o:
sys.stdout = o
for seq_record in SeqIO.parse(open(p, mode = "r"),"fasta"):
seq_record.description=' '.join(seq_record.description.split()[1:])
with open(file,"r") as f:
line = f.readlines()
print(line)
if (seq_record.id == line):
i = seq_record.description
print(i)
您只是缺少了某种循环
for x in y:
。此外,文件处理程序在Python中是可移植的(以非二进制模式的行进行迭代),这将避免您在开始迭代之前将整个文件加载到内存中(就像.readlines()
)相关问题 更多 >
编程相关推荐