使用解析器读取Python中的多个文件（需要一个简短的课程）

NM_032291 chr1 66999824 67210768 0 SGIP1 4694 6 1.0586e-02 NM_001080397 chr1 8384389 8404227 0 SLC45A1 2401 0 0.0000e+00 NM_018090 chr1 16767166 16786584 0 NECAP2 2081 3673 1.4617e+01 NM_032785 chr1 48998526 50489626 -0 AGBL4 2988 0 0.0000e+00 NM_001145278 chr1 16767166 16786584 0 NECAP2 2003 3534 1.4612e+01 NM_013943 chr1 25071759 25170815 0 CLIC4 4434 5646 1.0545e+01 NM_001145277 chr1 16767166 16786584 0 NECAP2 2005 3504 1.4473e+01 NM_052998 chr1 33546713 33585995 0 ADC 2182 4 1.5182e-02 NM_001195683 chr1 92145899 92351836 -0 TGFBR3 6464 59 7.5590e-02

NM_032291 chr1 66999824 67210768 + SGIP1 4694 44 9.5755e-02 NM_001080397 chr1 8384389 8404227 + SLC45A1 2401 4 1.7018e-02 NM_018090 chr1 16767166 16786584 + NECAP2 2081 1815 8.9095e+00 NM_032785 chr1 48998526 50489626 - AGBL4 2988 4 1.3675e-02 NM_001145278 chr1 16767166 16786584 + NECAP2 2003 1760 8.9760e+00 NM_013943 chr1 25071759 25170815 + CLIC4 4434 3859 8.8906e+00 NM_001145277 chr1 16767166 16786584 + NECAP2 2005 1719 8.7581e+00 NM_052998 chr1 33546713 33585995 + ADC 2182 14 6.5543e-02 NM_001195683 chr1 92145899 92351836 - TGFBR3 6464 49 7.7436e-02

#!/usr/bin/python import sys import re import os import tempfile import subprocess import math from optparse import OptionParser,OptionGroup VERSION = "1.0 " ########process the options########## usage = "usage: %prog -l <FILE> -i <FILE>,<FILE>,<FILE>....... -n <STRING> " parser = OptionParser() parser.add_option("-l", "--genelist file", dest="input_file", help="one string per line", metavar="FILE") parser.add_option("-i", "--RNASeq files (separted by comma)", dest="data_file", help="RNASeq file generated from Arjen's Script", metavar="FILE") parser.add_option("-n", "--name", dest="name", help="Name of output file", metavar="STRING") parser.add_option_group(group1) (options, args) = parser.parse_args() ####check whether all files & scripts are present#### if not options.input_file or not options.name: parser.print_help() sys.exit(0) ####reading input file ###### for item in open(options.input_file): item=item.replace("\n","") #######reading of data file and matching the components and assembling in final file########## This is where I am lost I dont know how to do it, the datafiles if more than 1 will be seperated by comma's. I have done similar thing with quick and dirty solution for one data file, The code for which is below (incase needed) #! /usr/bin/python inputfile="genelist.txt" rnafile="datafile.txt" for item in open(inputfile): item=item.replace("\n","") for line in open(rnafile): line = line.split("\t") if line[5] == item: print (line[5] + "\t" + line[8].replace("\n",""))

1条回答

网友

1楼 · 发布于 2024-09-30 08:38:17

好的，下面是我的未测试代码。这应该只是做什么，你正在寻找，酒吧异常处理和一些可能的格式问题：

# edit - have to remove trailing \n from input lines
valid_items = [ line.strip() for line in open('input') ]

with open('dictionary1') as dict1:

  for dict2_line in open('dictionary2'):
    dict1_line = dict1.readline()

    # protect against dict1 being shorter
    if dict1_line == '':
      break

    fields1 = dict1_line.split()
    if fields1[5] in valid_items and int(fields1[7]) > 5:
      fields2 = dict2_line.split()
      print(fields1[5].ljust(8) + fields1[8] + '  ' + fields2[8])

并不是说使用split而不使用参数会对任何空格进行拆分，不会产生空字段，并且应该删除后面的换行符。这可能是您正在寻找的，因为示例中的分隔符不一致。你知道吗

希望这有帮助！你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章