在Python中，如何合并具有重复值的列并保留来自不同列的max value？

+----------+---------------------+---------+ | reference | amount | column3 | column4 | +----------+---------------------+---------+ | test1 | 9 | 45 | ye | | test1 | 200| 45 | agag | | test1 | 1 | 45 | aaa | | test2 | 99 | 45 | bbab | | test1 | 11 | 45 | value | +----------+---------------------+----------+

3条回答

网友

1楼 · 编辑于 2024-09-30 01:33:07

pandaps是一个非常好的python模块，用于处理表格数据。它很像R语言，提供了一种内存数据库。对于您的例子，它很简单：

import pandas as pd

df = pd.read_csv('test.csv')
a = df.groupby('reference')[['amount']].max()
answer = df.merge(a, on='amount')

并将结果保存回csv：

^{pr2}$

假设测试.csv您的数据文件是这样的：

reference,amount,column3,column4
test1,9,45,ye
test1,200,45,agag
test1,1,45,aaa
test2,99,45,bbab
test1,11,45,value

网友

2楼 · 编辑于 2024-09-30 01:33:07

像下面这样的事情将是一个好的开始：

import csv, collections

with open("mydata.csv", 'r') as f_input:
    csv_input = csv.reader(f_input)
    # Assuming the first row contains the heading names, otherwise remove.
    headings = csv_input.next()     
    d_max_rows = collections.OrderedDict()

    for cols in csv_input:
        reference = cols[0]
        if reference in d_max_rows:
            cur_max = d_max_rows[reference]
            if int(cols[1]) >= int(cur_max[1]):
                d_max_rows[reference] = cols
        else:
            d_max_rows[reference] = cols

lrows = [headings] + list(d_max_rows.itervalues())

for reference, amount, col3, col4 in lrows:
    print "%-15s %-10s %-10s %-10s" % (reference, amount, col3, col4)

这将为您提供以下输出：

^{pr2}$

网友

3楼 · 编辑于 2024-09-30 01:33:07

下面是一些代码，可以满足您的需要：

from collections import namedtuple
import csv

Record = namedtuple('Record', 'reference amount column3 column4')

no_dups = {}
with open('references.csv', 'r', newline='') as csvfile:
    for rec in map(Record._make, csv.reader(csvfile)):
        if (rec.reference not in no_dups or
            int(no_dups[rec.reference].amount) < int(rec.amount)):
            no_dups[rec.reference] = rec

with open('references_out.csv', 'w', newline='') as csvfile:
    csv.writer(csvfile).writerows(rec for rec in no_dups.values())

相关问题更多 >

编程相关推荐

热门问题

热门文章