将字典中的值拆分为单独的值

magenta turquoise tan NULL PF00575 0 0 0 0 PF00154 0 1 0 0 PF06745 0 0 1 0 PF08423 0 0 1 0 PF13481 0 0 1 0 PF14520 0 0 1 0 PF00011 0 1 0 0

lines = [] lines.append(sheet.split("\n")) flattened=[] flattened = [val for sublist in lines for val in sublist] pfams = [] for i in flattened: pfams.append(i.split(",")) d = defaultdict(list) for i in pfams: pfam = i[0] d[pfam].append(i[1:])

2条回答

网友

1楼 · 编辑于 2024-05-03 22:36:54

感谢devshed上的dwblas，这是我发现的处理任务的最有效的方法：

我建立了一个字典，它的键是PFnumber，以及一个按我希望打印的颜色排序的列表。你知道吗

colors_list= ['cyan','darkorange','greenyellow','yellow','magenta','blue','green','midnightblue','brown','darkred','lightcyan','lightgreen','darkgreen','royalblue','orange','purple','tan','grey60','darkturquoise','red','lightyellow','darkgrey','turquoise','salmon','black','pink','grey','null']
lines = sheet.splitlines()
counts = {}

for line in lines:
    parts = line.split(",")
    if len(parts) > 1:
        ## doesn't break out the same item in the list many times
        color=parts[0].strip().lower()
        for key in parts[1:]:  ## skip color
            key=key.strip()
            if key not in counts:
                ## new key and list of zeroes-print it if you want to verify
                counts[key]=[0 for ctr in range(len(colors_list))]

            ## offset number/location of this color in list
            el_number=colors_list.index(color)
            if color > -1:  ## color found
                counts[key][el_number] += 1
            else:
                print "some error message"

import csv

with open("out.csv", "wb") as f:
    writer=csv.writer(f)
    writer.writerow( ["PFAM",] + colors_list)
    for pfam in counts:
    writer.writerow([pfam] + counts[pfam])

网友

2楼 · 编辑于 2024-05-03 22:36:54

使用collections.Counter（https://docs.python.org/2/library/collections.html#collections.Counter）

import collections

sheet = """
magenta
turquoise,PF00575
tan,PF00154,PF06745,PF08423,PF13481,PF14520
NULL
"""

acc = {}
for line in sheet.split('\n'):
    if line == "NULL":
         continue
    parts = line.split(',')
    acc[parts[0]] = collections.Counter(parts[1])

编辑：现在为每个键累积所有PF值

acc = collections.defaultdict(list)
for line in sheet.split('\n'):
    if line == "NULL":
         continue
    parts = line.split(',')
    acc[parts[0]] += parts[1:]
acc = {k: collections.Counter(v) for k,v in acc.iteritems()}

最终编辑计算每PF值的颜色出现次数，这是我们一直以来的结果，最后：

acc = collections.defaultdict(list)
for line in sheet.split('\n'):
    if line == "NULL":
         continue
    parts = line.split(',')
    for pfval in parts[1:]
         acc[ pfval ] += [ parts[0] ]
acc = {k: collections.Counter(v) for k,v in acc.iteritems()}

相关问题更多 >

编程相关推荐

热门问题

热门文章