用于将两列转换为68 x 150矩阵的Python脚本

2024-09-29 23:32:01 发布

您现在位置:Python中文网/ 问答频道 /正文

我是个博士生,有个数据争论的问题。我在文本文件中有两列数据,其格式如下:

Site  Species
A01   ACRB
A01   TBL
A02   TBL
A03   GRF   
...

我需要计算每个站点(即A01)每种物种类型(即ACRB)的数量,并生成一个包含约60个站点和150个物种的矩阵,如下所示:

^{pr2}$

对于如何最好地处理这个任务,我将非常感谢您的任何建议,因为我对Python非常陌生。在

谢谢你, -伊丽莎白


Tags: 数据类型数量站点物种格式sitespecies
3条回答
from StringIO import StringIO

input = """Site  Species
A01   ACRB
A01   TBL
A02   TBL
A03   GRF 
"""

counts = {}
sites = set()
species = set()

# Count pairs (site, specie)    
for line in StringIO(input).readlines()[1:]:
     site, specie = line.strip().split()
     sites.add(site)
     species.add(specie)
     count = counts.get((site, specie), 0)
     counts[(site, specie)] = count + 1

# Print first row.
print "Site\t",
for specie in species:
    print specie, "\t",
print

# Print other rows.
for site in sites:
    print site, "\t",
    for specie in species:
        print counts.get((site, specie), 0),
    print

让我们看看。。。在

import itertools

l = [('A01', 'ACRB'), ('A01', 'TBL'), ('A02', 'TBL'), ('A03', 'GRF')]

def mygrouping(l):
    speclist = list(set(i[1] for i in l))
    yield tuple(speclist)
    l.sort()
    gr = itertools.groupby(l, lambda i:i[0]) # i[0] is the site; group by that...
    for site, items in gr:
        counts = [0] * len(speclist)
        for _site, species in items:
            counts[speclist.index(species)] += 1
        yield site, tuple(counts)

print list(mygrouping(l))

另一个使用namedtuples的解决方案是

^{pr2}$

展示的东西我会给你的。在

下面是一个用Python2.7实现的方法

from collections import Counter
with open("in.txt") as f:
    next(f)  # do this to skip the first row of the file
    c = Counter(tuple(row.split()) for row in f if not row.isspace())

sites = sorted(set(x[0] for x in c))
species = sorted(set(x[1] for x in c))

print 'Site\t', '\t'.join(species)
for site in sites:
    print site,'\t', '\t'.join(str(c[site, spec]) for spec in species)

相关问题 更多 >

    热门问题