Python如何基于子字符串过滤字符串

% 1. Title: Iris Plants Database % % 2. Sources: % (a) Creator: R.A. Fisher % (b) Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov) % (c) Date: July, 1988 @RELATION iris @ATTRIBUTE sepallength REAL @ATTRIBUTE sepalwidth REAL @ATTRIBUTE petallength REAL @ATTRIBUTE petalwidth REAL @ATTRIBUTE class {Iris-setosa,Iris-versicolor,Iris-virginica} @DATA 5.1,3.5,1.4,0.2,Iris-setosa 4.9,3.0,1.4,0.2,Iris-setosa 4.7,3.2,1.3,0.2,Iris-setosa 4.6,3.1,1.5,0.2,Iris-setosa 5.0,3.6,1.4,0.2,Iris-setosa 5.4,3.9,1.7,0.4,Iris-setosa

['% 1. Title: Iris Plants Database'] ['% '] ['% 2. Sources:'] ['% (a) Creator: R.A. Fisher'] ['% (b) Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)'] ['% (c) Date: July', ' 1988'] ['% '] [] ['@RELATION iris'] [] ['@ATTRIBUTE sepallength\tREAL'] ['@ATTRIBUTE sepalwidth \tREAL'] ['@ATTRIBUTE petallength \tREAL'] ['@ATTRIBUTE petalwidth\tREAL'] ['@ATTRIBUTE class \t{Iris-setosa', 'Iris-versicolor', 'Iris-virginica}'] [] ['@DATA'] ['5.1', '3.5', '1.4', '0.2', 'Iris-setosa'] ['4.9', '3.0', '1.4', '0.2', 'Iris-setosa'] ['4.7', '3.2', '1.3', '0.2', 'Iris-setosa'] ['4.6', '3.1', '1.5', '0.2', 'Iris-setosa'] ['5.0', '3.6', '1.4', '0.2', 'Iris-setosa'] ['5.4', '3.9', '1.7', '0.4', 'Iris-setosa'] ['4.6', '3.4', '1.4', '0.3', 'Iris-setosa'] ['5.0', '3.4', '1.5', '0.2', 'Iris-setosa']

['5.1', '3.5', '1.4', '0.2', 'Iris-setosa'] ['4.9', '3.0', '1.4', '0.2', 'Iris-setosa'] ['4.7', '3.2', '1.3', '0.2', 'Iris-setosa'] ['4.6', '3.1', '1.5', '0.2', 'Iris-setosa'] ['5.0', '3.6', '1.4', '0.2', 'Iris-setosa'] ['5.4', '3.9', '1.7', '0.4', 'Iris-setosa'] ['4.6', '3.4', '1.4', '0.3', 'Iris-setosa'] ['5.0', '3.4', '1.5', '0.2', 'Iris-setosa']

2条回答

网友

1楼 · 编辑于 2024-09-30 22:19:37

我将利用in运算符和Python列表理解。在

我的意思是：

import csv

def loadCSVfile (path):
    exclusions = ['@', '%', '\n', '[@' , '[%']
    csvData = open(path, 'r')
    spamreader = csv.reader(csvData, delimiter=',', quotechar='|')      

    lines = [line for line in spamreader if ( line and line[0][0:1] not in exclusions and line[0][0:2] not in exclusions )]

    for line in lines:
        print(line)


loadCSVfile('C:/Users/anaim/Desktop/Data Mining/OneR/iris.arff')

网友

2楼 · 编辑于 2024-09-30 22:19:37

要测试一行是否为空，只需在布尔上下文中使用它；空列表为false。在

要测试字符串是否以某些特定字符开头，请使用str.startswith()，它可以是单个字符串，也可以是字符串的元组：

import csv
def loadCSVfile (path):
    with open(path, 'rb') as csvData:
        spamreader = csv.reader(csvData, delimiter=',', quotechar='|')
        for row in spamreader:
            if row and not row[0].startswith(('%', '@')):
                print row

因为您实际上是在测试固定宽度的字符串，所以您也可以只对第一列进行切片，然后使用in对序列进行测试；一个集合将是最有效的：

^{pr2}$

这里，[:1]切片表示法返回row[0]列的第一个字符（如果第一列为空，则返回空字符串）。在

我将openfile对象用作上下文管理器（with ... as ...），这样当代码块完成（或引发异常）时，Python会自动为我们关闭文件。在

永远不要直接调用双下划线方法（“dunder”方法或特殊方法），正确的API调用应该是len(row)。在

演示：

>>> loadCSVfile('/tmp/iris.arff')
['5.1', '3.5', '1.4', '0.2', 'Iris-setosa']
['4.9', '3.0', '1.4', '0.2', 'Iris-setosa']
['4.7', '3.2', '1.3', '0.2', 'Iris-setosa']
['4.6', '3.1', '1.5', '0.2', 'Iris-setosa']
['5.0', '3.6', '1.4', '0.2', 'Iris-setosa']
['5.4', '3.9', '1.7', '0.4', 'Iris-setosa']

相关问题更多 >

编程相关推荐

热门问题

热门文章