将字典中的值拆分为单独的值

2024-05-03 22:36:54 发布

您现在位置:Python中文网/ 问答频道 /正文

我有这种类型的字符串:

sheet = """
magenta
turquoise,PF00575
tan,PF00154,PF06745,PF08423,PF13481,PF14520
turquoise, PF00011
NULL
"""

每一行都以一个标识符开始(例如,棕褐色、洋红色……),我想要的是计算每个标识符中每个PF编号的出现次数。你知道吗

最后的结构是这样的:

         magenta  turquoise tan NULL
PF00575   0          0       0   0
PF00154   0          1       0   0
PF06745   0          0       1   0
PF08423   0          0       1   0
PF13481   0          0       1   0
PF14520   0          0       1   0
PF00011   0          1       0   0

我开始制作一本字典,每一行的第一个单词都是一个键,然后我想把它后面的PF数字作为值。你知道吗

当我使用这段代码时,我得到的值是字符串列表,而不是字典中单独的值:

lines = []
lines.append(sheet.split("\n"))
flattened=[]
flattened = [val for sublist in lines for val in sublist]
pfams = []
for i in flattened:
    pfams.append(i.split(","))
d = defaultdict(list)
for i in pfams:
pfam = i[0]
d[pfam].append(i[1:])

结果是:

defaultdict(<type 'list'>, {'': [[], []], 'magenta': [[]], 'NULL': [[]], 'turquoise': [['PF00575']], 'tan': [['PF00154', 'PF06745', 'PF08423', 'PF13481', 'PF14520']]})

如何将PFnumbers拆分,使它们在字典中成为单独的值,然后计算每个键中每个唯一PF number的出现次数?你知道吗


Tags: infor字典nulllinespfmagentatan
2条回答

感谢devshed上的dwblas,这是我发现的处理任务的最有效的方法:

我建立了一个字典,它的键是PFnumber,以及一个按我希望打印的颜色排序的列表。你知道吗

colors_list= ['cyan','darkorange','greenyellow','yellow','magenta','blue','green','midnightblue','brown','darkred','lightcyan','lightgreen','darkgreen','royalblue','orange','purple','tan','grey60','darkturquoise','red','lightyellow','darkgrey','turquoise','salmon','black','pink','grey','null']
lines = sheet.splitlines()
counts = {}

for line in lines:
    parts = line.split(",")
    if len(parts) > 1:
        ## doesn't break out the same item in the list many times
        color=parts[0].strip().lower()
        for key in parts[1:]:  ## skip color
            key=key.strip()
            if key not in counts:
                ## new key and list of zeroes-print it if you want to verify
                counts[key]=[0 for ctr in range(len(colors_list))]

            ## offset number/location of this color in list
            el_number=colors_list.index(color)
            if color > -1:  ## color found
                counts[key][el_number] += 1
            else:
                print "some error message"

import csv

with open("out.csv", "wb") as f:
    writer=csv.writer(f)
    writer.writerow( ["PFAM",] + colors_list)
    for pfam in counts:
    writer.writerow([pfam] + counts[pfam])

使用collections.Counterhttps://docs.python.org/2/library/collections.html#collections.Counter

import collections

sheet = """
magenta
turquoise,PF00575
tan,PF00154,PF06745,PF08423,PF13481,PF14520
NULL
"""

acc = {}
for line in sheet.split('\n'):
    if line == "NULL":
         continue
    parts = line.split(',')
    acc[parts[0]] = collections.Counter(parts[1])

编辑:现在为每个键累积所有PF值

acc = collections.defaultdict(list)
for line in sheet.split('\n'):
    if line == "NULL":
         continue
    parts = line.split(',')
    acc[parts[0]] += parts[1:]
acc = {k: collections.Counter(v) for k,v in acc.iteritems()}

最终编辑计算每PF值的颜色出现次数,这是我们一直以来的结果,最后:

acc = collections.defaultdict(list)
for line in sheet.split('\n'):
    if line == "NULL":
         continue
    parts = line.split(',')
    for pfval in parts[1:]
         acc[ pfval ] += [ parts[0] ]
acc = {k: collections.Counter(v) for k,v in acc.iteritems()}

相关问题 更多 >