所以我有一张这样的单子:
58308.803701 132.227.127.170 50602 149.13.32.15 443 6 64
58308.815456 149.13.32.15 443 132.227.127.170 50602 6 60
58308.815524 132.227.127.170 50602 149.13.32.15 443 6 52
58308.817244 132.227.127.170 50602 149.13.32.15 443 6 57
58308.828987 149.13.32.15 443 132.227.127.170 50602 6 52
58308.829133 149.13.32.15 443 132.227.127.170 50602 6 57
58308.829169 132.227.127.170 50602 149.13.32.15 443 6 52
58308.912361 132.227.127.170 50603 86.4.136.93 443 6 64
58308.912497 132.227.127.170 50599 94.31.112.216 443 6 95
58308.912568 132.227.127.170 50599 94.31.112.216 443 6 96
58308.912977 132.227.127.170 50599 94.31.112.216 443 6 847
58308.913411 132.227.127.170 50599 94.31.112.216 443 6 154
58308.913484 132.227.127.170 50599 94.31.112.216 443 6 233
....
....
....
我想把每一条相似的线(中间有相同的五列)分组,并在输出中显示第一列的最小值和平均值,中位数,平均值,最小值,最大值,…(所有可能的统计指标),如下所示:
58308.803701 132.227.127.170 50602 149.13.32.15 443 6 64
58308.815456 149.13.32.15 443 132.227.127.170 50602 6 60
min of(58308.815524,58308.817244) 132.227.127.170 50602 149.13.32.15 443 6 min/max/avg/...of(52,57)
min of(58308.828987,58308.829133) 149.13.32.15 443 132.227.127.170 50602 6 min/max/avg/...of(52,57)
58308.829169 132.227.127.170 50602 149.13.32.15 443 6 52
58308.912361 132.227.127.170 50603 86.4.136.93 443 6 64
min of(58308.912497,..,58308.913484) 132.227.127.170 50599 94.31.112.216 443 6 min/max/avg/...of(95,96,847,154,233)
....
....
....
以下是我迄今为止编写的代码,并试图使其正常工作:
from itertools import groupby
import re
import numpy as np
tstFile=open("output","w+")
with open('dataInput','r') as d:
f1 = ([x for x in line.split()] for line in d)
for a,b in groupby(f1,key=lambda x:x[1:6]):
tstFile.write("%s\t%s\t%s\t%s\t%s\t%s\t%s\n" %(min(x[0] for x in b)),min(x[6] for x in b)),max(x[6] for x in b)),np.average(x[6] for x in b)),np.mean(x[6] for x in b)),np.median(x[6] for x in b)),np.std(x[6] for x in b)))
tstFile.close()
但似乎没有什么真正的工作,它只适用于最小值和最大值,但要得到每个结果,我只需要使用一个参数。。。像这样:
tstFile=open("output","w+")
with open('dataInput','r') as d:
f1 = ([x for x in line.split()] for line in d)
for a,b in groupby(f1,key=lambda x:x[1:6]):
tstFile.write("%s\n" %(min(x[6] for x in b)))
tstFile.close()
请帮忙!你知道吗
在处理csv文件时,通常建议使用csv module。我在下面提供了一个示例代码,演示了如何解决这个问题。你知道吗
如果输入文件是以制表符分隔的,请更改为
delimiter='\t'
并删除csv.reader
中的skipinitialspace=True
-这些制表符在示例输入中不存在,但在复制/粘贴过程中可能已消失。你知道吗输出(我添加了一些选项卡以增加可读性):
相关问题 更多 >
编程相关推荐