我有一份格式如下的文件
QUERY: STBZIP38
Length of Query Sequence: 2000 bp | Nucleotide Frequencies: A - 0.34 G - 0.16 T - 0.35 C - 0.15
TFBS AC: RSP00073//OS: tobacco (Nicotiana tabacum) /GENE: synthetic oligonucleotides/TFBS: PA /BF: TAF-1
Motifs on "+" Strand: Mean Exp. Number 0.00391 Up.Conf.Int. 1 Found 1
421 tCCACGTGGC 430 (Mism.= 1)
Motifs on "-" Strand: Mean Exp. Number 0.00391 Up.Conf.Int. 1 Found 1
430 GCCACGTGGa 421 (Mism.= 1)
TFBS AC: RSP00153//OS: Parsley, Petroselinum crispum /GENE: CHS/TFBS: Box II /BF: CPRF-1; CPRF-2; CPRF-3;
Motifs on "+" Strand: Mean Exp. Number 0.00358 Up.Conf.Int. 1 Found 1
422 CCACGTGGCa 431 (Mism.= 1)
TFBS AC: RSP00154//OS: parsley (Petroselinum crispum) /GENE: CHS/TFBS: ACE (CHS) /BF: bZIP factors CPRF1, CPRF4
Motifs on "+" Strand: Mean Exp. Number 0.00358 Up.Conf.Int. 1 Found 1
422 CCACGTGGCa 431 (Mism.= 1)
Totally 50 motifs of 43 different TFBSs have been found
____________________________________________________________
QUERY: STBZIP17
Length of Query Sequence: 2000 bp | Nucleotide Frequencies: A - 0.37 G - 0.13 T - 0.39 C - 0.11
TFBS AC: RSP00577//OS: tomato (Lycopersicon esculentum), Lycopersicon esculentum /GENE: rbcS3A/TFBS: AT-rich FF2 /BF: unknown nuclear factor
Motifs on "-" Strand: Mean Exp. Number 0.00187 Up.Conf.Int. 1 Found 1
206 AATAATTAaAcATTAATTAA 187 (Mism.= 2)
TFBS AC: RSP00797//OS: potato (Solanum tuberosum) /GENE: patatin 21/TFBS: SURE-1 /BF: SURF
Motifs on "-" Strand: Mean Exp. Number 0.00440 Up.Conf.Int. 1 Found 1
1027 TAAAGAATAaAAAAAaaAA 1009 (Mism.= 3)
TFBS AC: RSP00864//OS: arabidopsis (Arabidopsis thaliana) /GENE: STK/TFBS: GA-5 /BF: BPC1
Motifs on "-" Strand: Mean Exp. Number 0.00260 Up.Conf.Int. 1 Found 1
1966 AGAGAGAGA 1958 (Mism.= 0)
我想要的输出如下
STBZIP38 RSP00073//OS
STBZIP38 RSP00153//OS
STBZIP38 RSP00154//OS
STBZIP17 RSP00577//OS
STBZIP17 RSP00797//OS
STBZIP17 RSP00864//OS
我正在看一些教程并尝试使用split函数(我仍在学习python的a、B、C)。我从以下内容开始,我仍在试图弄清楚的是,如何只抓取我使用的术语后面的单词(例如,QUERY:然后只抓取STBZIP38,然后抓取TFBC AC:后面的数字),。 如果有人能在这方面帮助我,我真的很感激。提前谢谢
with open ('Softberry.txt') as fo:
for rec in fo:
print((rec.split('QUERY:')) + ',' +(rec.split('TFBS AC:')))
一个伟大的样板开始,我准备了确切的正则表达式模式,做其余的。PS:您需要的是readlines()方法+regex,没有拆分
输出
好的,伙计们。下面是它如何结束的。非常感谢CYREX和xelf(Reddit)提供的帮助
相关问题 更多 >
编程相关推荐