一组命令行统计工具
stats-tools的Python项目详细描述
Installation
Installing stats-tools
使用easy_install安装stats-tools:
sudo easy_install stats-tools
或者从git存储库中签出:
git clone git://github.com/jweslley/stats-tools.git cd stats-tools sudo python setup.py install
Utilities
- min - Calculate the minimum of a number sequence
- max - Calculate the maximum of a number sequence
- mean - Calculate the mean of a number sequence
- median - Calculate the median of a number sequence
- std - Calculate the standard deviation of a number sequence
- var - Calculate the variance of a number sequence
- sum - Calculate the sum of a number sequence
- stats - Output a summary table including mean, median, mininum, maximum, standard deviation and variance of a number sequence
- summary - Output a summary table including minimum, lower quartile, median, upper quartile, maximum of a number sequence
- fivenum - Calculate Tukey’s five number summary (minimum, lower-hinge, median, upper-hinge, maximum) of a number sequence based on 1.5 times the interquartile distance
Usage
所有实用程序都以表格格式的文件作为输入,以便根据该文件执行某些计算。提示输入文件如下所示:
1 2 4 3 5 4 6 4 6 4 5 6 9 12 16
考虑到这个输入文件,我们称它为example1.dat,您可以计算一些统计信息,如:
第一列的max值:
max example1.dat
第二列的min值:
min -c 2 example1.dat
您仍然可以使用负数从右边开始计数因此,最后一列值的sum:
sum -c -1 example1.dat
如果输入文件的列由另一个字符而不是空白字符(空格、制表符、换行符、回车符、formfeed)分隔,则可以使用-s选项来表示这一点。下一个示例输出关于以下文件的第二列(example2.dat)的统计信息summary:
"A",10,12 "A",11,14 "B",5,8 "B",6,10 "A",10.5,13 "B",7,11
计算摘要:
summary -c 2 -s , example2.dat
通常,数据文件可能包含一个头,即第一行描述列,类似于下面显示的example3.dat文件:
Year,Make,Model,Description,Price 1997,Ford,E350,"ac abs moon",3000.00 1999,Chevy,"Venture ""Extended Edition""","",4900.00 1999,Chevy,"Venture ""Extended Edition, Very Large""","",5000.00 1996,Jeep,Grand Cherokee,"MUST SELL!air, moon roof, loaded",4799.00
-b选项从计算中删除第一行在这种情况下,汽车的平均价格由下式给出:
mean -b -s, -c-1 test/example3.dat
Piping data
如果没有文件传递给标准输入,则所有stats-tools读取数据下面的命令计算文件foo.dat中包含单词bar的第二列的最大值:
grep bar foo.dat | max -c 2