groupby元素的平均值

2024-10-01 07:33:55 发布

您现在位置:Python中文网/ 问答频道 /正文

所以我有一张这样的单子:

58308.803701    132.227.127.170 50602   149.13.32.15      443   6   64
58308.815456    149.13.32.15    443     132.227.127.170   50602 6   60
58308.815524    132.227.127.170 50602   149.13.32.15      443   6   52
58308.817244    132.227.127.170 50602   149.13.32.15      443   6   57
58308.828987    149.13.32.15    443     132.227.127.170   50602 6   52
58308.829133    149.13.32.15    443     132.227.127.170   50602 6   57
58308.829169    132.227.127.170 50602   149.13.32.15      443   6   52
58308.912361    132.227.127.170 50603   86.4.136.93       443   6   64
58308.912497    132.227.127.170 50599   94.31.112.216     443   6   95
58308.912568    132.227.127.170 50599   94.31.112.216     443   6   96
58308.912977    132.227.127.170 50599   94.31.112.216     443   6   847
58308.913411    132.227.127.170 50599   94.31.112.216     443   6   154
58308.913484    132.227.127.170 50599   94.31.112.216     443   6   233
....
....
....

我想把每一条相似的线(中间有相同的五列)分组,并在输出中显示第一列的最小值和平均值,中位数,平均值,最小值,最大值,…(所有可能的统计指标),如下所示:

58308.803701                            132.227.127.170 50602   149.13.32.15      443   6   64
58308.815456                            149.13.32.15    443     132.227.127.170   50602 6   60
min of(58308.815524,58308.817244)       132.227.127.170 50602   149.13.32.15      443   6   min/max/avg/...of(52,57)
min of(58308.828987,58308.829133)       149.13.32.15    443     132.227.127.170   50602 6   min/max/avg/...of(52,57)
58308.829169                            132.227.127.170 50602   149.13.32.15      443   6   52
58308.912361                            132.227.127.170 50603   86.4.136.93       443   6   64
min of(58308.912497,..,58308.913484)    132.227.127.170 50599   94.31.112.216     443   6   min/max/avg/...of(95,96,847,154,233)
....
....
....

以下是我迄今为止编写的代码,并试图使其正常工作:

from itertools import groupby 
import re 
import numpy as np

tstFile=open("output","w+") 
with open('dataInput','r') as d:
      f1 = ([x for x in line.split()] for line in d)
      for a,b in groupby(f1,key=lambda x:x[1:6]):
          tstFile.write("%s\t%s\t%s\t%s\t%s\t%s\t%s\n" %(min(x[0] for x in b)),min(x[6] for x in b)),max(x[6] for x in b)),np.average(x[6] for x in b)),np.mean(x[6] for x in b)),np.median(x[6] for x in b)),np.std(x[6] for x in b)))
tstFile.close()

但似乎没有什么真正的工作,它只适用于最小值和最大值,但要得到每个结果,我只需要使用一个参数。。。像这样:

tstFile=open("output","w+")
with open('dataInput','r') as d:
    f1 = ([x for x in line.split()] for line in d)
    for a,b in groupby(f1,key=lambda x:x[1:6]):
        tstFile.write("%s\n" %(min(x[6] for x in b)))
tstFile.close()

请帮忙!你知道吗


Tags: ofinimportforasnplineopen
1条回答
网友
1楼 · 发布于 2024-10-01 07:33:55

在处理csv文件时,通常建议使用csv module。我在下面提供了一个示例代码,演示了如何解决这个问题。你知道吗

如果输入文件是以制表符分隔的,请更改为delimiter='\t'并删除csv.reader中的skipinitialspace=True-这些制表符在示例输入中不存在,但在复制/粘贴过程中可能已消失。你知道吗

import csv
from itertools import groupby
import numpy as np

with open('data.csv') as in_file, open('out.csv', 'wb') as out_file:
    reader = csv.reader(in_file, delimiter=' ', skipinitialspace=True)
    writer = csv.writer(out_file, delimiter='\t')
    for key, group in groupby(reader, key=lambda r: r[1:6]):
        col0, col6 = np.array(list(group))[:, [0, 6]].transpose().astype(float)
        writer.writerow([min(col0)] + key + [int(min(col6)), int(max(col6)),
                                             np.mean(col6)])    

输出(我添加了一些选项卡以增加可读性):

58308.803701    132.227.127.170 50602   149.13.32.15    443     6   64  64  64.0
58308.815456    149.13.32.15    443     132.227.127.170 50602   6   60  60  60.0
58308.815524    132.227.127.170 50602   149.13.32.15    443     6   52  57  54.5
58308.828987    149.13.32.15    443     132.227.127.170 50602   6   52  57  54.5
58308.829169    132.227.127.170 50602   149.13.32.15    443     6   52  52  52.0
58308.912361    132.227.127.170 50603   86.4.136.93     443     6   64  64  64.0
58308.912497    132.227.127.170 50599   94.31.112.216   443     6   95  847 285.0

相关问题 更多 >