无法分组和合计csv fi

2024-05-03 07:59:58 发布

您现在位置:Python中文网/ 问答频道 /正文

我创建了一个csv文件,其中包含两列author和number of books-请参见示例(抱歉,下面的示例看起来不像一个表,但第1列有author,第2列只有number 1)

Vincent 1
Vincent 1
Vincent 1
Vincent 1
Thomas  1
Thomas  1
Thomas  1
Jimmy   1
Jimmy   1

我试图创建一个输出csv的总和,作者即文森特5,托马斯3和吉米2书

使用下面的代码,我成功地进入了中间阶段,在那里我得到了每个作者的累计总数。行print line[0], countAuthor产生的结果是好的

Vincent 1
Vincent 2
Vincent 3
Vincent 4
Thomas  1
Thomas  2
Thomas  3
Jimmy   1
Jimmy   2

然后,我计划将此输出放入一个列表,按降序排序,只保留值最高的记录,即当前作者与前一作者相同的位置,然后跳过-然后将输出写入另一个csv文件

我的问题是,我无法将作者和累计总数写入一个列表-我可以将其放入变量w中。print w[2]起作用,但print data[2]不起作用,因为数据似乎只有一列。任何帮助将不胜感激,因为我花了近两天的时间在这方面没有太多的运气-我被迫使用csv作为完整的文件有空白等作者姓名

with open("testingtesting6a.csv") as inf:
data = []
author = 'XXXXXXXX'
countAuthor = 0.0
for line in inf:
    line = line.split(",")
    if line[0] == author:
        countAuthor = countAuthor + float(line[1])
    else:
        countAuthor = float(line[1])
        author = line[0]

    # print line[0], countAuthor

    w = (line[0],line[1],countAuthor)
    print w[2]
    data.append(w)
    print data[2]
    # print data[0]
    # print type(w)
    # print w[2]

Tags: 文件csv示例number列表datalinethomas
2条回答

标准库已经涵盖了这一点。你知道吗

import collections

def sum_up(input_file):
    counter = collections.defaultdict(int)
    for line in input_file:
        parts = line.split()  # splits by any whitespace.
        if len(parts) != 2:
          continue  # skip the line that does not parse; maybe a blank line.
        name, number = parts
        counter[name] += int(number)  # you can't borrow 1.25 books.
    return counter

现在您可以:

with open('...') as f:
  counts = sum_up(f)

for name, count in sorted(counts.items()):
  print name, count  # prints counts sorted by name.

print counts['Vincent']  # prints 4.

print counts['Jane']  # prints 0.

这里的诀窍是使用^{},一种假装对任何键都有值的dict。我们要求它有一个由int()生成的默认值,即0。你知道吗

使用strip、groupby和Pandas删除空格:

输入文件(可选空格是有意的):

author,books
Vincent, 1
Vincent , 1
Vincent, 1
Vincent, 1
Thomas  ,  1
Thomas,  1
Thomas,  1
Jimmy,   1
Jimmy  ,   1

import csv
import pandas as pd

fin = open('author.csv', 'r')
reader = csv.DictReader(fin, delimiter=',')

# strip remove spaces
authors=[( (d['author']).strip(), int((d['books']).strip())) for d in reader]

df = pd.DataFrame(authors)
df.columns = ['author', 'books']
df2 = (df.groupby('author').sum())
print (df2)    

         books
author        
Jimmy        2
Thomas       3
Vincent      4

# For total of books:
print (df2.books.sum())
9

相关问题 更多 >