python,如果line.startswith开始(“word”)检查第20行

2024-09-21 05:41:35 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在查一个文件。如果这行以“SegID”开头,我想看它后面的第21行,如果这行以“细胞质”以外的任何东西开头,我想把以SegID开头的行和以“细胞质”以外的任何东西开头的行写入一个文件。你知道吗

到目前为止,我有:

import sys
import argparse
import operator
import re
import itertools

def main (argv):
    parser = argparse.ArgumentParser(description='find a location')
    parser.add_argument('infile', help='file to process')
    parser.add_argument('outfile', help='file to produce')
    args = parser.parse_args()
    tag = "SeqID:"
    tag2 = "Cytoplasmic"

    with open(args.infile, "r") as f,open(args.outfile,"w+") as of:
        file_in = f.readlines()
        for line in file_in:
            if line.startswith(tag)and line[21:] != "Cytoplasmic":
                 of.write(line)
if __name__ == "__main__":
   main(sys.arg

以下是输入文件的示例:

SeqID: YP_008914846.1 opacity protein [Neisseria gonorrhoeae FA 1090]
  Analysis Report:
    CMSVM-            Unknown                       [No details]
    CytoSVM-          Unknown                       [No details]
    ECSVM-            Unknown                       [No details]
    ModHMM-           Unknown                       [No internal helices found]
    Motif-            Unknown                       [No motifs found]
    OMPMotif-         Unknown                       [No motifs found]
    OMSVM-            OuterMembrane                 [No details]
    PPSVM-            Unknown                       [No details]
    Profile-          Unknown                       [No matches to profiles found]
    SCL-BLAST-        OuterMembrane                 [matched 60392864: Opacity protein opA54 precursor]
    SCL-BLASTe-       Unknown                       [No matches against database]
    Signal-           Unknown                       [No signal peptide detected]
  Localisation Scores:
    OuterMembrane          10.00
    Extracellular          0.00
    Periplasmic            0.00
    Cytoplasmic            0.00
    CytoplasmicMembrane    0.00
  Final Prediction:
    OuterMembrane          10.00

-------------------------------------------------------------------------------

SeqID: YP_008914847.1 hypothetical protein NGO0146a [Neisseria gonorrhoeae FA 1090]
  Analysis Report:
    CMSVM-            Unknown                       [No details]
    CytoSVM-          Unknown                       [No details]
    ECSVM-            Unknown                       [No details]
    ModHMM-           Unknown                       [No internal helices found]
    Motif-            Unknown                       [No motifs found]
    OMPMotif-         Unknown                       [No motifs found]
    OMSVM-            Unknown                       [No details]
    PPSVM-            Unknown                       [No details]
    Profile-          Unknown                       [No matches to profiles found]
    SCL-BLAST-        Unknown                       [No matches against database]
    SCL-BLASTe-       Unknown                       [No matches against database]
    Signal-           Unknown                       [No signal peptide detected]
  Localization Scores:
    CytoplasmicMembrane    2.00
    Cytoplasmic            2.00
    OuterMembrane          2.00
    Periplasmic            2.00
    Extracellular          2.00
  Final Prediction:
    Unknown


Tags: tonoimportparserlineargsdetailsunknown
2条回答

我的Python有点生锈了,请原谅。我希望我正确地推断出所需的输出,否则请评论。你知道吗

这假设来自测序实验的样本总是由任意内容的3行偏移量分开,并且每个样本有22行。你知道吗

import re

def extract_data(filename):
  numLinesToSkip = 3
  offset = 22
  seqIdLineNumber = 0
  predictionLineNumber = 21
  with open(filename, "r") as f:
      output = []
      while True:
        try: head = [next(f) for x in xrange(offset)]
        except StopIteration: break
        line21 = re.split(r'\s+',head[predictionLineNumber].strip())
        sample = head[seqIdLineNumber].rstrip() + "\t" + " ".join(line21)
        output.append(sample)
        try: [next(f) for x in xrange(numLinesToSkip)]
        except StopIteration: break
      print "\n".join(output)

if __name__ == "__main__":
  extract_data("test.txt")

您可以尝试使用以下方法:

    with open('credentials.json', "r") as f:
        file_in = f.readlines()
        for i,line in enumerate(file_in):

            if line.startswith(tag) and \
                    (i+21)< len(file_in) and \ 
                    not(file_in[i+21].strip().startswith("Cytoplasmic")):
                of.write(line)
                of.write(file_in[i+21])

相关问题 更多 >

    热门问题