如何重复更新python字典而不丢失另一个字典中的key的原始数据？

2024-10-16 17:19:15 发布

男 | 程序猿一只，喜欢编程写python代码。

我正在尝试用另一个来自第二个字典的文件中的信息来更新从一个txt文件创建的字典。我的问题是每次我试图更新它都会把我的文件缩短到 single dictionary output" : {my updated output}，而不是预期的{my updated output},{my updated output}

首先尝试合并字典，基本上有两个并排的字典，然后尝试使用dictionary1.update(dictionary2[key])更新字典，它给了我“单一字典输出”。你知道吗

import re
import os
import glob
asps = []

gbFileNames = list(glob.glob(os.path.join('/Users/schneider/Downloads/Reilly/*.gb')))

gbDict = {}

for myfile in gbFileNames:
    currentfile = open(myfile, 'r')
    for line in currentfile:
        if 'ACCESSION' in line: 
            accn = line.split(' ')[-1].rstrip()
            gbDict[accn] = {'host':'','isolate':''}
        elif 'host=' in line: 
            gbDict[accn]['host'] += line.split('"')[1]
        elif 'isolate=' in line: 
            gbDict[accn]['isolate'] += line.split('"')[1]

seqFileNames = list(glob.glob(os.path.join('/Users/schneider/Downloads/Reilly/*.txt')))

fastaDict = {}

for myfile in seqFileNames:
    currentfile = open(myfile, 'r')
    for line in currentfile:
        if '>' in line:
        # DEFINE GENE ID
            pseudoGeneID = re.search('(?<=gene)\w{1,}', line)
            GeneID = pseudoGeneID.group(0)
        #   fastaDict[GeneID] = {'accn':'','host':'','isolate':'','seq':''} #initiate subdictionary after introducing GeneID variable
            fastaDict[GeneID] = {'accn':'','seq':''} #initiate subdictionary after introducing GeneID variable
            # DEFINE TAXON by accession number
            accn = line.split('|')[1].split('.')[0]
            fastaDict[GeneID]['accn'] += accn.rstrip() # assign accession ID to dictionary using += refer to rstrip down below :)
        else:
            seq = line # here we basically say that if it doesnt start with > we assume it must be a sequence, thus we call the line a seq to make more sense :) 
            fastaDict[GeneID]['seq'] += seq.rstrip()  # rstrip is used here to guarantee that any crap will not come along with your nice sequence data


    fastaDict[GeneID].update(gbDict[accn])  
print fastaDict[GeneID]

fastaDict output = GeneID{accn;seq}
gbDict output = accn{host;isolate}

预期结果：

updatedDict output = GENEID{accn;seq;host;isolate}

注意：GeneID不是唯一的，因为多个文件将具有相同的GeneID，“accn”与GeneID组合在一起是唯一的。最终，我们要为每个基因输出一个带有多个登录号的fasta文件accn'是重复多次给定多个GeneID从一个单一accn，等同于单一基因组。Host和isolate是我们要在输出行中使用的附带标识数据，以及唯一的GeneID+accn组合。你知道吗

数据结构：1个accn有多个序列，每个序列有1个基因ID、宿主和分离物。你知道吗

Tags：文件 in host output 字典 line seq glob

0条回答

目前没有回答

如何重复更新python字典而不丢失另一个字典中的key的原始数据？

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何重复更新python字典而不丢失另一个字典中的key的原始数据？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >