在python中，从另一个文件中提取行并将唯一的字段保存在单独的文件中

#! /usr/bin/env python import gzip lookup = dict() my_file = open("bed.txt","r") for line in my_file.readlines(): row = line.split() lookup[row[3]] = row[1:] # print lookup my_file.close() with open('MyOutFile', 'w') as outfile: with gzip.open("values.gz", "r") as eqtl: for line in eqtl.readlines(): for key in lookup: if line.find(key) > -1: outfile.write(line)

with gzip.open("values.gz", "r") as eqtl: for line in eqtl.readlines(): row = line.split() gene = row[1] filename = gene+'.txt' if gene in lookup: # assign unique file name here, then if os.path.exists(filename): append_write = 'a' else: append_write = 'w' with open(filename,append_write) as outfile: outfile.write(line)

1条回答

网友

1楼 · 发布于 2024-10-02 00:28:06

在这里你可以做两件事。在

首先，看起来你在你的查找表中存储了代表第一个文件的geneID的数量。如果第二个文件中的数量与第一个文件中的类型和值相同，则可以像这样更有效地搜索查找表：

代码段：

for line in eqtl.readlines():
   row = line.split()
   if row[1] in lookup:
       # do something...

第二，如果你想为每个基因取一个唯一的名字，你的文件应该是内部循环，而不是外部循环。像这样：

^{pr2}$

它将取决于你想如何为每个基因分配一个唯一的文件名-也许使用行中的其他数据？在

相关问题更多 >

编程相关推荐

热门问题

热门文章