Python线和图案匹配后打印

2024-09-30 08:31:07 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个这样的文件(有+10000个序列,+98000行):

>DILT_0000000001-mRNA-1
MKVVKICSKLRKFIESRKDAVLPEQEEVLADLWAFEGISEFQMERFAKAAQCFQHQYELA
IKANLTEHASRSLENLGRARARLYDYQGALDAWTKRLDYEIKGIDKAWLHHEIGRAYLEL
NQYEEAIDHAATARDVADREADMEWDLNATVLIAQAHFYAGNLEEAKVYFEAAQNAAFRK
GFFKAESVLAEAIAEVDSEIRREEAKQERVYTKHSVLFNEFSQRAVWSEEYSEELHLFPF
AVVMLRCVLARQCTVHLQFRSCYNL
>DILT_0000000101-mRNA-1
MSCRRLSMNPGEALIKESSAPSRENLLKPYFDEDRCKFRHLTAEQFSDIWSHFDLDGVNE
LRFILRVPASQQAGTGLRFFGYISTEVYVHKTVKVSYIGFRKKNNSRALRRWNVNKKCSN
AVQMCGTSQLLAIVGPHTQPLTNKLCHTDYLPLSANFA
>DILT_0001999301-mRNA-1
LEHGIQPDGQMPSDKTIGGGDDSFQTFFSETGAGKHVPRAVMVDLEPTVIGEYLCVLLTS
FILFRLISTNLGPNSQLASRTLLFAADKTTLFRLLGLLPWSLLKIAVQ
>DILT_0001999401-mRNA-1
MAENGEDANMPEEGKEGNTQDQGEHQQDVQSDEPNEADSGYSSAASSDVNSQTIPITVIL
PNREAVNLSFDPNISVSELQERLNGPGITRLNENLFFTYSGKQLDPNKTLLDYKVQKSST
LYVHETPTALPKSAPNAKEEGVVPSNCLIHSGSRMDENRCLKEYQLTQNSVIFVHRPTAN
TAVQNREEKTSSLEVTVTIRETGNQLHLPINPHXXXXTVEMHVAPGVTVGDLNRKIAIKQ

所有带“>;”的行都是ID。下面几行是关于ID的序列

我还有一个文件,上面有我想要的序列的ID,比如:

^{pr2}$

我想写一个脚本来匹配id并复制这个id的序列,但是我只是得到id,没有序列。在

seqs = open('WBPS10.protein.fa').readlines()
ids = open('ids.txt').readlines()

for line in ids:
    for record in seqs:
        if line == record[1:]:
            print record

我不知道该写什么来获取ID后面的n行,因为有时它是2行,其他序列有更多,正如您在我的示例中看到的那样。在

问题是,我试着不使用生物圈,这样会容易得多。我只想学别的方法。在


Tags: 文件inididsforline序列open
2条回答

这应该对你有用。if line == record[1:]:如果字符串中有特殊字符,则语句将不起作用,例如\r\n。您只对查找匹配的id感兴趣。下面的代码对你有用。在

代码示例

seqs = open('WBPS10.protein.fa').readlines()
ids = open('ids.txt').readlines()

for line in ids:
    for record in seqs:
        if line in  record :
            print record

输出:

^{pr2}$
seqs_by_ids = {}
with open('WBPS10.protein.fa', 'r') as read_file:
    for line in read_file.readlines():
        if line.startswith('>'):
            current_key = line[1:].strip()
            seqs_by_ids[current_key] = ''
        else:
            seqs_by_ids[current_key] += line.strip()

ids = set([line.strip() for line in open('ids.txt').readlines()])

for id in ids:
    if id in seqs_by_ids:
        print(id)
        print('\t{}'.format(seqs_by_ids[id]))

输出:

^{pr2}$

相关问题 更多 >

    热门问题