解析multifasta文件以提取序列

seq="" seqcount=0 seqlen=0 gc=0 #prompt user for file name infile=input("Enter the name of your designated .fasta file: ") with open(infile, "r") as fasta: print("\n") print ("Accession Number \t Sequence Length \t GC content (%)") for line in fasta: line.strip() if line[0]==">": seqcount+=1 #counts number sequences in file accession=line.split("|")[3] #extract accession seq="" else: seq+=line[:-1] seqlen=len(seq) print(accession, "\t \t", seqlen) print("\n") print("There are a total of", seqcount, "sequences in this file.")

1条回答

网友

1楼 · 发布于 2024-10-04 01:25:24

你离正确的密码不远了：

seq=""
seqcount=0

#prompt user for file name
infile=input("Enter the name of your designated .fasta file: ")

def pct_gc(s):
    gc = s.count('G') + s.count('C') + s.count('g') + s.count('c')
    total = len(s)

    return gc*100.0/total

with open(infile, "r") as fasta:
    print("\n")
    print ("Accession Number\tSequence Length\tGC content (%)")
    for line in fasta:
        line = line.strip()
        if line[0]==">":
            if seq != "":
                print("{}\t{}\t{}".format(accession, pct_gc(seq), len(seq)))

            seqcount+=1 #counts number sequences in file
            accession=line.split("|")[3] #extract accession
            seq=""
        else: 
            seq+=line[:-1]

print("{}\t{}\t{}".format(accession, pct_gc(seq), len(seq)))
print("\n")           
print("There are a total of " + str(seqcount) + " sequences in this file.")

需要注意的事项：

不需要在每次迭代中都更新长度。最后再计算一下。在
在str.str.带（）不修改对象，而是返回剥离对象
你必须使用这样一个事实：当你找到下一个序列时，你知道你读了一个完整的序列，而这个序列不是空的。此时必须编写输出。在
最后一个序列不是由新的加入完成的，所以您必须在循环结束后独立地处理它。在
使用字符串格式或串联字符串。如果只将字符串和变量用逗号隔开，则会得到一个元组表示输出。在

相关问题更多 >

编程相关推荐

热门问题

热门文章