如何从另一个文件填充字典值?

2024-10-04 15:26:31 发布

您现在位置:Python中文网/ 问答频道 /正文

我有两个文件(每个索引用空格分隔):

文件1.txt

OTU0001 Archaea
OTU0002 Archaea;Aenigmarchaeota;Deep Sea Euryarchaeotic Group(DSEG);uncultured archaeon
OTU0003 Archaea;Altiarchaeales;uncultured euryarchaeote
OTU0004 Archaea;Bathyarchaeota;uncultured archaeon
OTU0005 Archaea;Diapherotrites;uncultured euryarchaeote
OTU0006 Archaea;Euryarchaeota;Halobacteria;Halobacteriales;Halobacteriaceae;uncultured
OTU0007 Archaea;Euryarchaeota;Halobacteria;Halobacteriales;Halobacteriaceae;uncultured;marine metagenome

文件2.txt

UniRef90_1 OTU0001 OTU0004 OTU0005 OTU0007 
UniRef90_2 OTU0002 OTU0003 OTU0005 
UniRef90_3 OTU0004 OTU0006 OTU0007 

我想在第二个文件中,用第一个文件中的值替换OTUXXXX。我需要把Uniref90_X放在每行的开头。第二个文件的第一行应该是这样的:

UniRef90_1 Archaea (#OTU0001) Archaea;Bathyarchaeota;uncultured archaeon (#OTU0004) Archaea;Diapherotrites;uncultured euryarchaeote (#OTU0005) Archaea;Euryarchaeota;Halobacteria;Halobacteriales;Halobacteriaceae;uncultured;marine metagenome (#OTU0007) 

目前,我已经为第二个文件创建了一个字典, UniRef90_X作为键,OTUXXXX作为值。你知道吗

f1=open("file1.txt", "r")
f2=open("file2.txt", "r")

dict={}
for i in f2:
    i=i.split(" ")
    dict[i[0]]=i[1:]
    for j in f1:
        j=j.split(" ")
        if j[0] in dict.values():
            dico[i[0]]=j[1:]

但是我不知道如何用第一个fileny想法中的相应值替换OTUXXXX?你知道吗


Tags: 文件txtuniref90archaeaunculturedeuryarchaeoteeuryarchaeotaotu0007
2条回答

首先,不要将变量命名为类。永远不会。改用d2之类的。你知道吗

然后,将[1]替换为[1:]

然后,在导入字典中的第一个文件后,就像导入第二个文件一样—我们将其命名为d1—可以这样组合值:

d3=dict()
for e in d2:
    L=list()
    for f in d2[e]:
        L.append(d1[f])
    d3[e]=f(L) #format your list here

最后,将其转换回字符串并将其写入文件中。你知道吗

我建议把第一个文件放进字典里。这样,当您读取file2时,您可以查找从file1捕获的id。你知道吗

按照设置循环的方式,您将从file2中读取第一条记录并将其输入哈希。密钥永远不会与file1中的任何内容匹配。然后你读了文件1,在那里做了一些事情。下次从file2读取时,file1的所有内容都将从file2的第一次迭代中耗尽。你知道吗

下面是一种将文件1读入字典的方法,当它在文件2中找到匹配项时,将它们打印出来。你知道吗

file1 = {} # declare a dictionary

fin = open('f1.txt', 'r')

for line in fin:
    # strip the ending newline
    line = line.rstrip()

    # only split once
    # first part into _id and second part into data
    _id, data = line.split(' ', 1)

    # data here is a single string possibly containing spaces
    # because only split once (above)
    file1[_id] = data

fin.close()

fin = open('f2.txt', 'r')

for line in fin:
    uniref, *ids = line.split() # here ids is a list (because prepended by *)

    print(uniref, end='')
    for _id in ids:
        if _id in file1:
            print(' ', file1[_id], '(#' + _id + ')', end='')
    print()

fin.close()

打印输出为:

UniRef90_1  Archaea (#OTU0001)  Archaea;Bathyarchaeota;uncultured archaeon (#OTU0004)  Archaea;Diapherotrites;uncultured euryarchaeote (#OTU0005)  Archaea;Euryarchaeota;Halobacteria;Halobacteriales;Halobacteriaceae;uncultured;marine metagenome (#OTU0007)
UniRef90_2  Archaea;Aenigmarchaeota;Deep Sea Euryarchaeotic Group(DSEG);uncultured archaeon (#OTU0002)  Archaea;Altiarchaeales;uncultured euryarchaeote (#OTU0003)  Archaea;Diapherotrites;uncultured euryarchaeote (#OTU0005)
UniRef90_3  Archaea;Bathyarchaeota;uncultured archaeon (#OTU0004)  Archaea;Euryarchaeota;Halobacteria;Halobacteriales;Halobacteriaceae;uncultured (#OTU0006)  Archaea;Euryarchaeota;Halobacteria;Halobacteriales;Halobacteriaceae;uncultured;marine metagenome (#OTU0007)

相关问题 更多 >

    热门问题