如何在python中更有效地对csv文件中的列进行求和

Year Country Albania Andorra Armenia Austria Azerbaijan 2009 Lithuania 0 0 0 0 1 2009 Israel 0 7 0 0 0 2008 Israel 1 2 2 0 4 2008 Lithuania 1 5 1 0 8

3条回答

网友

1楼 · 编辑于 2024-09-29 03:30:36

您的帽子戏法正在使用the defaultdict from the collections module，请搜索

python defaultdict

所以，你会发现很多有用的例子，下面是我的答案

import csv
from collections import defaultdict

# slurp the data
data = list(csv.reader(open('points.csv')))

# massage the data
for i, row in enumerate(data[1:],1):
    data[i] = [int(elt) if elt.isdigit() else elt for elt in row]

points = {} # an empty dictionary
for i, country in enumerate(data[0][2:],2):
    # for each country, a couple country:defaultdict is put in points
    points[country] = defaultdict(int)
    for row in data[1:]:
        opponent = row[1]
        points[country][opponent] += row[i]

# here you can  post-process  points as you like,
# I'll simply print out the stuff
for country in points:
    for opponent in points[country]:
        print country, "vs", opponent, "scored",
        print points[country][opponent], "points."

数据的示例输出是

^{pr2}$

编辑

如果您反对defaultdict，那么可以使用普通dict的.get方法，该方法允许您在{}对未初始化的情况下返回可选的默认值

    points[country] = {} # a standard empty dict
    for row in data[1:]:
        opponent = row[1]
        points[country][opponent] = points[country].get(opponent,0) + row[i]

如您所见，它有点笨拙，但仍然可以管理。在

网友

2楼 · 编辑于 2024-09-29 03:30:36

您可以使用the Pandas module，它非常适合这种类型的应用程序：

import pandas as pd

df = pd.read_csv('songfestival.csv')
gb = df.groupby('Country')
res = pd.concat([i[1].sum(numeric_only=True) for i in gb], axis=1).T
res.pop('Year')
order = [i[0] for i in gb]

print(order)
print(res)

#['Israel', 'Lithuania']
#   Albania  Andorra  Armenia  Austria  Azerbaijan
#0        1        9        2        0           4
#1        1        5        1        0           9

要查询每个列的结果，只需执行以下操作：

^{pr2}$

网友

3楼 · 编辑于 2024-09-29 03:30:36

好的，那么您希望这些行按年份汇总：

import csv
from collections import defaultdict

with open("songfestival.csv", "r") as ifile:
    reader = csv.DictReader(ifile)
    country_columns = [k for k in reader.fieldnames if k not in ["Year","Country"]]
    data = defaultdict(lambda:defaultdict(int))
    for line in reader:
        curr_country = data[line["Country"]]
        for country_column in country_columns:
            curr_country[country_column] += int(line[country_column])

    with open("songfestival_aggr.csv", "w") as ofile:
        writer = csv.DictWriter(ofile, fieldnames=country_columns+["Country"])
        writer.writeheader()
        for k, v in data.items():
            row = dict(v)
            row["Country"] = k
            writer.writerow(row)

我有权把它输出到另一个csv文件中。数据结构非常容易出错，因为它取决于列的顺序。最好在dict中使用中间dict来为聚合指定名称->；请参阅@gboffi对您的问题的评论。在

相关问题更多 >

编程相关推荐

热门问题

热门文章