Python：使用不同的开始和结束标记逐行解析文件

网友

1楼 · 编辑于 2024-09-26 18:09:12

这是我的实现。不完全确定is_terminator（）逻辑应该是什么样子。在

def is_terminator(tokens):
    """
    Return True if tokens is a terminator.

    """
    is_token_terminator = False    
    tokens = tokens.split()
    if len(tokens) > 0:
        token = tokens[-1]
        if token.endswith(";"):
            try:
                int(token[:-1])
            except ValueError:                
                pass # not an int.. and so not a terminator?
            else:
                 is_token_terminator = True
    return is_token_terminator


sublist = []
result = [sublist, ]
f = file("input.txt", "r")
for tokens in f.readlines():

    sublist.append(tokens)        

    if is_terminator(tokens):
        sublist = []
        result.append(sublist)

print result

网友

2楼 · 编辑于 2024-09-26 18:09:12

为了解析文件，您需要找到模式，这将引导您成功地收集数据。在

从您的示例中可以看出，当您读取带有整数和分号的字符串时，您将停止在子列表中追加项。我试着这样做：

import ast
result = []

with open(f,'rb') as fl:
    sublist = []
    for line in fl:            
        line = line.strip()
        sublist.append(line)
        if type(ast.literal_eval(line[0])) is int and line[-1] == ';':
            result.append(sublist)
            sublist = []

网友

3楼 · 编辑于 2024-09-26 18:09:12

可以理解，每个子列表以#开头，以;结尾。Pythonic实现使用的正是Pythonic实现：

def read_lists():
    with open('data') as file:
        sublist = []
        previous_line = ''
        for line in file:
            line = line.strip()
            if line.startswith('#') and previous_line.endswith(';'):
                yield sublist
                sublist = []
            sublist.append(line)
            previous_line = line
        yield sublist

for sublist in read_lists():
    print(sublist)

['#Wiliam', '#Arthur', '#Jackie', 'high;', '10 11 11;']
['#Jim', '#Jill', '#Catherine', '#Abby', 'low;', 'girl;', '10 11 11 11;']
['#Ablett', '#Adelina', 'none;', '5,8;']

相关问题更多 >

编程相关推荐

热门问题

热门文章

Python：使用不同的开始和结束标记逐行解析文件

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >