使用csv中的mrjob查找最大值

2024-09-28 21:55:15 发布

您现在位置:Python中文网/ 问答频道 /正文

具有csv文件,如:

date,value,id_point,coordinateX,coordinateY,station   
{'$date': '2020-03-28T04:45:00Z'},0,A107,39.45855,-0.45851,AvFrance  
{'$date': '2020-02-28T04:45:00Z'},45,A122,39.45855,-0.45851,AvSpain   
{'$date': '2020-04-28T04:45:00Z'},33,A107,39.45855,-0.45851,AvFrance   
{'$date': '2020-05-28T04:45:00Z'},12,A133,39.45855,-0.45851,AvItaly   
{'$date': '2020-06-28T04:45:00Z'},0,A107,39.45855,-0.45851,AvFrance   
{'$date': '2020-07-28T04:45:00Z'},77,A117,39.45855,-0.45851,AvSpain   
{'$date': '2020-08-28T04:45:00Z'},46,A122,39.45855,-0.45851,AvSpain   
{'$date': '2020-09-28T04:45:00Z'},51,A198,39.45855,-0.45851,AvItaly 

我需要使用MRJob类来编写一个MapReduce软件,用于为每个站点查找 最大流量值,并显示该值的日期。我试着这样做:

from mrjob.job import MRJob   

class Maxim(MRJob):

    def mapper(self, key, line):
        (date,value,id_point,coordinateX,coordinateY,station) = line.split(',')
        if date !='date':
            yield str(station), int(value)

    def reducer(self, station, value, date):
        yield station, date, max(value)

if __name__ == '__main__':
    Maxim.run()

找不到问题,请帮忙


Tags: iddatevaluedefpointstationmrjobmaxim