python，如果line.startswith开始（“word”）检查第20行

import sys import argparse import operator import re import itertools def main (argv): parser = argparse.ArgumentParser(description='find a location') parser.add_argument('infile', help='file to process') parser.add_argument('outfile', help='file to produce') args = parser.parse_args() tag = "SeqID:" tag2 = "Cytoplasmic" with open(args.infile, "r") as f,open(args.outfile,"w+") as of: file_in = f.readlines() for line in file_in: if line.startswith(tag)and line[21:] != "Cytoplasmic": of.write(line) if __name__ == "__main__": main(sys.arg

SeqID: YP_008914846.1 opacity protein [Neisseria gonorrhoeae FA 1090] Analysis Report: CMSVM- Unknown [No details] CytoSVM- Unknown [No details] ECSVM- Unknown [No details] ModHMM- Unknown [No internal helices found] Motif- Unknown [No motifs found] OMPMotif- Unknown [No motifs found] OMSVM- OuterMembrane [No details] PPSVM- Unknown [No details] Profile- Unknown [No matches to profiles found] SCL-BLAST- OuterMembrane [matched 60392864: Opacity protein opA54 precursor] SCL-BLASTe- Unknown [No matches against database] Signal- Unknown [No signal peptide detected] Localisation Scores: OuterMembrane 10.00 Extracellular 0.00 Periplasmic 0.00 Cytoplasmic 0.00 CytoplasmicMembrane 0.00 Final Prediction: OuterMembrane 10.00 ------------------------------------------------------------------------------- SeqID: YP_008914847.1 hypothetical protein NGO0146a [Neisseria gonorrhoeae FA 1090] Analysis Report: CMSVM- Unknown [No details] CytoSVM- Unknown [No details] ECSVM- Unknown [No details] ModHMM- Unknown [No internal helices found] Motif- Unknown [No motifs found] OMPMotif- Unknown [No motifs found] OMSVM- Unknown [No details] PPSVM- Unknown [No details] Profile- Unknown [No matches to profiles found] SCL-BLAST- Unknown [No matches against database] SCL-BLASTe- Unknown [No matches against database] Signal- Unknown [No signal peptide detected] Localization Scores: CytoplasmicMembrane 2.00 Cytoplasmic 2.00 OuterMembrane 2.00 Periplasmic 2.00 Extracellular 2.00 Final Prediction: Unknown

2条回答

网友

1楼 · 编辑于 2024-09-21 05:41:35

我的Python有点生锈了，请原谅。我希望我正确地推断出所需的输出，否则请评论。你知道吗

这假设来自测序实验的样本总是由任意内容的3行偏移量分开，并且每个样本有22行。你知道吗

import re

def extract_data(filename):
  numLinesToSkip = 3
  offset = 22
  seqIdLineNumber = 0
  predictionLineNumber = 21
  with open(filename, "r") as f:
      output = []
      while True:
        try: head = [next(f) for x in xrange(offset)]
        except StopIteration: break
        line21 = re.split(r'\s+',head[predictionLineNumber].strip())
        sample = head[seqIdLineNumber].rstrip() + "\t" + " ".join(line21)
        output.append(sample)
        try: [next(f) for x in xrange(numLinesToSkip)]
        except StopIteration: break
      print "\n".join(output)

if __name__ == "__main__":
  extract_data("test.txt")

网友

2楼 · 编辑于 2024-09-21 05:41:35

您可以尝试使用以下方法：

    with open('credentials.json', "r") as f:
        file_in = f.readlines()
        for i,line in enumerate(file_in):

            if line.startswith(tag) and \
                    (i+21)< len(file_in) and \ 
                    not(file_in[i+21].strip().startswith("Cytoplasmic")):
                of.write(line)
                of.write(file_in[i+21])

相关问题更多 >

编程相关推荐

热门问题

热门文章