不带pand的CSV文件中重复键的求和值

2024-09-28 18:16:28 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个大的数据集,如下所示

party,cp,qualifier,amount
ABC,DEF,GOOGLE_2,100
ABC,DEF,GOOGLE_2,200
GHI,JKL,FACEBOOK_1,500
GHI,JKL,FACEBOOK_1,-600

我想输出:

ABC,DEF,GOOGLE,300
GHI,JKL,FACEBOOK,-100

以下是我迄今为止的python代码:

headers = ["valuation_date","party_group_name","type","party_name","cp_group_name","cp_name","qualifier","amount"]
data = {}
with open(t1file,'rb') as f:
        reader = csv.reader(f)
        headers = reader.next()
        for row in reader:
                party = row[headers.index('party')]
                cp = row[headers.index('cp')]
                qualifier = row[headers.index('qualifier')]
                amount = row[headers.index('amount')]
                if row[headers.index('type')] == "Equity":
                        new_qualifier = qualifier.split("_")[0]
                        if party in data.keys():
                                if cp in data.keys():
                                        if new_qualifier in data.keys():
                                                data[party][cp][new_qualifier] += float(amount)
                                        else:
                                                data[party][cp][qualifier][amount] = data[party][cp][new_qualifier][amount]
                                else:
                                        data[cp] = cp
                        else:
                             data[party] = party

运行上述代码时,出现以下错误:

  data[party][cp][qualifier][amount] = data[party][cp][new_qualifier][amount]
TypeError: string indices must be integers, not str

非常生疏与Python道歉,如果它是明显的讣告,但任何见解,我做错了什么? 谢谢!你知道吗


Tags: nameinnewdataindexifpartydef
3条回答

可以使用pandas.drop_duplicates删除多个列的副本,并将其与pandas.groupby()&;sum组合以获得所需的结果

>>>import pandas as pd
>>>#read file using pandas.read_csv()
>>>df
  party   cp   qualifier  amount
0   ABC  DEF    GOOGLE_2     100
1   ABC  DEF    GOOGLE_2     200
2   GHI  JKL  FACEBOOK_1     500
3   GHI  JKL  FACEBOOK_1    -600

>>>df['Total'] = df.groupby(['party','cp','qualifier'])['amount'].transform('sum')
>>>print(df.drop_duplicates(subset=['party','cp','qualifier'], keep='last'))

  party   cp   qualifier  amount  Total
1   ABC  DEF    GOOGLE_2     200    300
3   GHI  JKL  FACEBOOK_1    -600   -100

您已经有了足够的答案,但让我更正您自己的代码,以帮助您得出答案并理解原始问题:

import csv as csv
headers = ["valuation_date","party_group_name","party_name","cp_group_name","cp_name","qualifier","amount"]

data = {}
with open('test_data.csv','rt', encoding='utf-8') as f:
    reader = csv.reader(f)
    headers = next(reader)
    for row in reader:
            party = row[headers.index('party')]
            cp = row[headers.index('cp')]
            qualifier = row[headers.index('qualifier')]
            amount = row[headers.index('amount')]
            if row[headers.index('type')] == "Equity":
                    new_qualifier = qualifier.split("_")[0]
                    if party in data.keys():
                            cp_ = data[party]
                            if cp in cp_.keys():
                                    qualifier_ = data[party][cp]
                                    if new_qualifier in qualifier_.keys():
                                            data[party][cp][new_qualifier] += float(amount)
                                    else:
                                            data[party][cp][qualifier][amount] = {}
                            else:
                                    data[cp] = {}
                    else:
                             data[party] = {}
                             data[party][cp] = {}
                             data[party][cp][qualifier.split("_")[0]] = float(amount)
print(data)

这给你

{'ABC': {'DEF': {'GOOGLE': 300.0}}, 'GHI': {'JKL': {'FACEBOOK': -100.0}}}

问题是你是如何填充你的字典和你是如何访问它。你知道吗

下面

from collections import defaultdict

PARTY_IDX = 0
CP_IDX = 1
QUALIFIER_IDX = 2
AMOUNT_IDX = 3

data = defaultdict(int)

with open('del-me.csv') as f:
    lines = [l.strip() for l in f.readlines()]
    for idx, line in enumerate(lines):
        if idx > 0:
            fields = line.split(',')
            party = fields[PARTY_IDX]
            cp = fields[CP_IDX]
            qualifier = fields[QUALIFIER_IDX]
            qualifier = qualifier[:qualifier.find('_')]
            key = ','.join([party, cp, qualifier])
            amount = int(fields[AMOUNT_IDX])
            data[key] += amount
with open('out.csv', 'w') as f:
    for k, v in data.items():
        f.write('{},{}\n'.format(k, v))

德尔-我.csv你知道吗

party,cp,qualifier,amount
ABC,DEF,GOOGLE_2,100
ABC,DEF,GOOGLE_2,200
GHI,JKL,FACEBOOK_1,500
GHI,JKL,FACEBOOK_1,-600

你知道吗输出.csv你知道吗

ABC,DEF,GOOGLE,300
GHI,JKL,FACEBOOK,-100

相关问题 更多 >