使用python或sh比较两个文件时如何获取完整行

2024-09-29 19:31:19 发布

您现在位置:Python中文网/ 问答频道 /正文

unique.txt文件包含:2列,列之间用tab分隔。 total.txt文件包含:3列,每列用tab分隔。你知道吗

我从unique.txt文件中获取每一行,并在total.txt文件中找到它。 如果存在,则从total.txt中提取整行,并将其保存在新的输出文件中。你知道吗

###Total.txt
column a        column b                    column c
interaction1    mitochondria_205000_225000  mitochondria_195000_215000
interaction2    mitochondria_345000_365000  mitochondria_335000_355000
interaction3    mitochondria_345000_365000  mitochondria_5000_25000
interaction4    chloroplast_115000_128207   chloroplast_35000_55000
interaction5    chloroplast_115000_128207   chloroplast_15000_35000
interaction15   2_10515000_10535000 2_10505000_10525000

###Unique.txt
column a                    column b
mitochondria_205000_225000  mitochondria_195000_215000
mitochondria_345000_365000  mitochondria_335000_355000
mitochondria_345000_365000  mitochondria_5000_25000
chloroplast_115000_128207   chloroplast_35000_55000
chloroplast_115000_128207   chloroplast_15000_35000
mitochondria_185000_205000  mitochondria_25000_45000
2_16595000_16615000 2_16585000_16605000
4_2785000_2805000   4_2775000_2795000
4_11395000_11415000 4_11385000_11405000
4_2875000_2895000   4_2865000_2885000
4_13745000_13765000 4_13735000_13755000

Tags: 文件txtcolumntabtotaluniqueinteraction4interaction1
2条回答

这是我的python脚本

enter code here`file=open('total.txt')

file2 = open('unique.txt')
all_content=file.readlines()
all_content2=file2.readlines()
store_id_lines = []
ff = open('match.dat', 'w')

for i in range(len(all_content)):
              line=all_content[i].split('\t')
              seq=line[1]+'\t'+line[2]
              for j in range(len(all_content2)):
                     if all_content2[j]==seq:
                           ff.write(seq)
                           break 

但它不提供期望输出(满足if条件的第1列的值)。 我觉得好像唯一.txt==第i个总计.txt 然后写下第i行总计.txt导入新文件

这应该能奏效。你知道吗

import csv
total = "C:\\...total.txt" #set path to your file!
unique = "C:\\...unique.txt"
newfile = "C:\\...match.csv"

a = []
b = []
towrite = []

with open(total, "r") as rcursor1: #read the document
    for trow in rcursor1: #read each row
        row1 = trow.split("\t") #split it by your seperator
        a.append(row1[1:]) #we are only interested in everything from column b onwards


with open(unique, "r") as rcursor2:
    for urow in rcursor2:
        row2 = urow.split("\t")
        b.append(row2)


print "This is a", a
print len(a)
print "This is b", b
print len(b)

a1 = set(map(tuple, a)) #lists are hashable, but we need unhasable object to work with set
b1 = set(map(tuple, b)) #that why change list to tuples, tuples are not hashable

matches = set(a1).intersection(b1) #find the matches, best is to take shorter list as first argument for better perfomance!
print "Our matches, unsorted!", matches

with open(newfile, 'wb') as wcursor: #write to file
    for i in matches:
        c = list(i)
        d = ",".join(c)
        print d
        wcursor.write(str(d)+"\n")

相关问题 更多 >

    热门问题