我刚接触熊猫,想从中找出平均基本工资的平均值工资.csv从kaggle.com“SF paleries”下载的文件。但是额外的逗号“,” 在JobTitle字段中(例如ID5)似乎产生了问题,因为默认的字段分隔符是“,”。你知道吗
Id,EmployeeName,JobTitle,BasePay,OvertimePay,OtherPay,Benefits,TotalPay,TotalPayBenefits,Year,Notes,Agency,Status
1,NATHANIEL FORD,GENERAL MANAGER-METROPOLITAN TRANSIT AUTHORITY,167411.18,0.0,400184.25,,567595.43,567595.43,2011,,San Francisco,
2,GARY JIMENEZ,CAPTAIN III (POLICE DEPARTMENT),155966.02,245131.88,137811.38,,538909.28,538909.28,2011,,San Francisco,
3,ALBERT PARDINI,CAPTAIN III (POLICE DEPARTMENT),212739.13,106088.18,16452.6,,335279.91,335279.91,2011,,San Francisco,
4,CHRISTOPHER CHONG,WIRE ROPE CABLE MAINTENANCE MECHANIC,77916.0,56120.71,198306.9,,332343.61,332343.61,2011,,San Francisco,
5,PATRICK GARDNER,"DEPUTY CHIEF OF DEPARTMENT,(FIRE DEPARTMENT)",134401.6,9737.0,182234.59,,326373.19,326373.19,2011,,San Francisco,
我目前看到的一种方法是编辑文件,用空格替换逗号,用sed替换“|”
sed 's/\(\"[^",]\{1,\}\),\([^",]\{1,\}\"\)/\1 | \2/g'
使用
sal=pd.read_csv('/Users/Downloads/Salaries.csv')
sal['BasePay'].mean()
熊猫是否提供了其他方法来清理这些数据?你知道吗
使用小函数消除字段中不需要的逗号
由于数据集中的BasePay列包含字符串值,因此最好将“Not provided”值替换为0.00,并转换为float进行平均操作
相关问题 更多 >
编程相关推荐