我是来自Java世界的Python新手。在
我试图编写一个简单的python函数,它只输出CSV或“arff”文件的数据行。非数据行以这3个模式@、[@、[%开头,不应打印这些行。
示例数据文件片段:
% 1. Title: Iris Plants Database
%
% 2. Sources:
% (a) Creator: R.A. Fisher
% (b) Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)
% (c) Date: July, 1988
@RELATION iris
@ATTRIBUTE sepallength REAL
@ATTRIBUTE sepalwidth REAL
@ATTRIBUTE petallength REAL
@ATTRIBUTE petalwidth REAL
@ATTRIBUTE class {Iris-setosa,Iris-versicolor,Iris-virginica}
@DATA
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
5.0,3.6,1.4,0.2,Iris-setosa
5.4,3.9,1.7,0.4,Iris-setosa
Python脚本:
^{pr2}$实际产量:
['% 1. Title: Iris Plants Database']
['% ']
['% 2. Sources:']
['% (a) Creator: R.A. Fisher']
['% (b) Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)']
['% (c) Date: July', ' 1988']
['% ']
[]
['@RELATION iris']
[]
['@ATTRIBUTE sepallength\tREAL']
['@ATTRIBUTE sepalwidth \tREAL']
['@ATTRIBUTE petallength \tREAL']
['@ATTRIBUTE petalwidth\tREAL']
['@ATTRIBUTE class \t{Iris-setosa', 'Iris-versicolor', 'Iris-virginica}']
[]
['@DATA']
['5.1', '3.5', '1.4', '0.2', 'Iris-setosa']
['4.9', '3.0', '1.4', '0.2', 'Iris-setosa']
['4.7', '3.2', '1.3', '0.2', 'Iris-setosa']
['4.6', '3.1', '1.5', '0.2', 'Iris-setosa']
['5.0', '3.6', '1.4', '0.2', 'Iris-setosa']
['5.4', '3.9', '1.7', '0.4', 'Iris-setosa']
['4.6', '3.4', '1.4', '0.3', 'Iris-setosa']
['5.0', '3.4', '1.5', '0.2', 'Iris-setosa']
期望输出:
['5.1', '3.5', '1.4', '0.2', 'Iris-setosa']
['4.9', '3.0', '1.4', '0.2', 'Iris-setosa']
['4.7', '3.2', '1.3', '0.2', 'Iris-setosa']
['4.6', '3.1', '1.5', '0.2', 'Iris-setosa']
['5.0', '3.6', '1.4', '0.2', 'Iris-setosa']
['5.4', '3.9', '1.7', '0.4', 'Iris-setosa']
['4.6', '3.4', '1.4', '0.3', 'Iris-setosa']
['5.0', '3.4', '1.5', '0.2', 'Iris-setosa']
我将利用in运算符和Python列表理解。在
我的意思是:
要测试一行是否为空,只需在布尔上下文中使用它;空列表为false。在
要测试字符串是否以某些特定字符开头,请使用
str.startswith()
,它可以是单个字符串,也可以是字符串的元组:因为您实际上是在测试固定宽度的字符串,所以您也可以只对第一列进行切片,然后使用
^{pr2}$in
对序列进行测试;一个集合将是最有效的:这里,
[:1]
切片表示法返回row[0]
列的第一个字符(如果第一列为空,则返回空字符串)。在我将openfile对象用作上下文管理器(
with ... as ...
),这样当代码块完成(或引发异常)时,Python会自动为我们关闭文件。在永远不要直接调用双下划线方法(“dunder”方法或特殊方法),正确的API调用应该是
len(row)
。在演示:
相关问题 更多 >
编程相关推荐