使用python用第二个文件的内容搜索一个文件的内容

2024-10-08 22:29:53 发布

您现在位置:Python中文网/ 问答频道 /正文

我有以下代码将输入文件1的第一列上的项目与输入文件2的内容进行比较:

导入操作系统

newfile2=[]
outfile=open("outFile.txt","w")
infile1=open("infile1.txt", "r")
infile2=open("infile2.txt","r")
for file1 in infile1:
    #print file1
    file1=str(file1).strip().split("\t")
    print file1[0]
    for file2 in infile2:
        if file2 == file1[0]:
            outfile.write(file2.replace(file2,file1[1]))
        else:
            outfile.write(file2)

输入文件1:

Modex_xxR_SL1344_3920   Modex_sseE_SL1344_3920
Modex_seA_hemN  Modex_polA_SGR222_3950
Modex_GF2333_3962_SL1344_3966   Modex_ertd_wedS

输入文件2:

Sardes_xxR_SL1344_4567  
Modex_seA_hemN
MOdex_uui_gytI

由于输入文件1项(列1,行2)与输入文件2中的项(行2)匹配,因此输入文件1中的列2项将替换输出文件中的输入文件2项,如下所示(必需输出):

Sardes_xxR_SL1344_4567  
Modex_polA_SGR222_3950
MOdex_uui_gytI

到目前为止,我的代码只输出输入文件1中的项。有人能帮忙修改这个代码吗。谢谢


Tags: 文件代码intxtforopenfile1outfile
1条回答
网友
1楼 · 发布于 2024-10-08 22:29:53

看起来您有一个tsv文件,所以让我们继续处理它。我们将构建一个tsv读取器csv.reader(fileobj, delimiter="\t"),它将遍历infile1,并从中构建一个翻译dict。字典每行有第一列的键和第二列的值。你知道吗

然后使用dict.get我们可以从infile2翻译行,如果它存在于我们的翻译目录中,或者如果没有可用的翻译,只写行本身。你知道吗

import csv

with open("infile1.txt", 'r') as infile1,\
     open('infile2.txt', 'r') as infile2,\
     open('outfile.txt', 'w') as outfile:
    trans_dict = dict(csv.reader(infile1, delimiter="\t"))

    for line in infile2:
        outfile.write(trans_dict.get(line.strip(),line.strip()) + "\n")

结果:

# contents of outfile.txt
Sardes_xxR_SL1344_4567
Modex_polA_SGR222_3950
MOdex_uui_gytI

根据您的评论编辑:

import csv

    with open("infile1.txt", 'r') as infile1:
        # build our translation dict
        trans_dict = dict(csv.reader(infile1, delimiter="\t"))

    with open("infile2.txt", 'r') as infile2,\
         open("outfile.txt", 'w') as outfile:
        # open the file to translate and our output file
        reader = csv.reader(infile2, delimiter="\t")
        # treat our file to translate like a tsv file instead of flat text
        for line in reader:
            outfile.write("\t".join([trans_dict.get(col, col) for col in line] + "\n"))
            # map each column from trans_dict, writing the whole row
            # back re-tab-delimited with a trailing newline

相关问题 更多 >

    热门问题