将结构化(机器的结构)文本文件(configfile)解析为结构化表形式

2024-06-24 13:40:34 发布

您现在位置:Python中文网/ 问答频道 /正文

主要目标是从一个可读性或多或少的配置文件中获取一个表格式,在没有对机器及其配置标准有更深入了解的情况下,每个人都可以读取该表格式

我有一个配置文件:

******A MANO:111111         ,20190726,001,0914,06621242746     
DXS*HAWA776A0A*VA*V0/6*1
ST*001*0001
ID1*HAW250755*VMI1-9900****250755*6*0
CB1*021545*DeBright*7.030.16*3.02*250755
PA1*0*100
PA1*1*60
PA2*2769*166140*210*12600*0*0*0*0
******E MANO:111111         ,20190726,001,0914,06621242746     
******A MANO:222222         ,20190726,001,0914,06621242746     
DXS*HAWA776A0A*VA*V0/6*1
ST*001*0001
ID1*HAW250755*VMI1-9900****250755*6*0
CB1*021545*DeBright*7.030.16*3.02*250755
PA1*0*100
PA1*1*60
PA2*2769*166140*210*12600*0*0*0*0
******E MANO:222222         ,20190726,001,0914,06621242746   

文件中有几个对象总是以“A MANO:”开头,以“E MANO:”结尾,后跟对象编号。 下面的所有行都是对象的属性(机器的设置)。并非所有对象都具有相同数量的设置。一个对象可能有55行,另一个对象可能有199行

到目前为止我尝试的是:

from pyparsing import *

'''
grammar:
object_nr ::= Word(nums, exact=6)
num ::= '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'
'''

path_input = r'\\...\...'

with open(path_input) as input_file:
    line = input_file.readline()
    cnt = 1

object_nr_parser = Word(nums, exact=6)

for match, start, stop in object_nr_parser.scanString(input_file):
    print(match, start, stop)

这给了我打印输出: ['201907'] 116 122 ['019211']172 178

它找到的数字以及字符串中的起点和终点。但这个数字不是我要找的,也不正确。我甚至在配置文件中找不到第二个数字

用pyparsing解决这个问题是正确的方法还是有更方便的方法?我哪里犯的错

最后,如果我能为每台机器提供一个对象,这个对象的属性是A MANO:和E MANO:

预期结果如下:

{"object": "111111",
"line1":"DXS*HAWA776A0A*VA*V0/6*1",
"line2":"ST*001*0001",
"line3":"ID1*HAW250755*VMI1-9900****250755*6*0",
"line4":"CB1*021545*DeBright*7.030.16*3.02*250755",
"line5":"PA1*0*100",
"line6":"PA1*1*60",
"line7":"PA2*2769*166140*210*12600*0*0*0*0"},
{"object": "222222",
"line1":"DXS*HAWA776A0A*VA*V0/6*1",
"line2":"ST*001*0001",
"line3":"ID1*HAW250755*VMI1-9900****250755*6*0",
"line4":"CB1*021545*DeBright*7.030.16*3.02*250755",
"line5":"PA1*0*100",
"line6":"PA1*1*60",
"line7":"PA2*2769*166140*210*12600*0*0*0*0",
"line8":"PA2*2769*166140*210*12600*0*0*0*0",
"line9":"PA2*2769*166140*210*12600*0*0*0*0",
"line10":"PA2*2769*166140*210*12600*0*0*0*0"}

不确定这是否是最好的解决方案,但它是在这一点上想到的

完成这件事最肮脏的方法之一就是使用regex,用换行符替换MANO,用“;”替换所有换行符。我不认为这是一个应该使用的解决方案


Tags: 对象inputobjectstid1vav0mano
1条回答
网友
1楼 · 发布于 2024-06-24 13:40:34

你可以逐行解析:

import re

with open('file.txt', 'r') as f:
    lines = f.readlines()
    lines = [x.strip() for x in lines]

result = []
name = ''
i = 1
for line in lines:
    if 'A MANO' in line:
        name = re.findall('A MANO:(\d+)', line)[0]
        result.append({'object': name})
        i = 1
    elif 'E MANO' not in line:
        result[-1][f'line{i}'] = line
        i += 1

输出:

[{
        'object': '111111',
        'line1': 'DXS*HAWA776A0A*VA*V0/6*1',
        'line2': 'ST*001*0001',
        'line3': 'ID1*HAW250755*VMI1-9900****250755*6*0',
        'line4': 'CB1*021545*DeBright*7.030.16*3.02*250755',
        'line5': 'PA1*0*100',
        'line6': 'PA1*1*60',
        'line7': 'PA2*2769*166140*210*12600*0*0*0*0'
    }, {
        'object': '222222',
        'line1': 'DXS*HAWA776A0A*VA*V0/6*1',
        'line2': 'ST*001*0001',
        'line3': 'ID1*HAW250755*VMI1-9900****250755*6*0',
        'line4': 'CB1*021545*DeBright*7.030.16*3.02*250755',
        'line5': 'PA1*0*100',
        'line6': 'PA1*1*60',
        'line7': 'PA2*2769*166140*210*12600*0*0*0*0'
    }
]

但我建议使用更紧凑的输出格式:

import re

with open('file.txt', 'r') as f:
    lines = f.readlines()
    lines = [x.strip() for x in lines]

result = {}
name = ''
for line in lines:
    if 'A MANO' in line:
        name = re.findall('A MANO:(\d+)', line)[0]
        result[name] = []
    elif 'E MANO' not in line:
        result[name].append(line)

输出:

{
    '111111': ['DXS*HAWA776A0A*VA*V0/6*1', 'ST*001*0001', 'ID1*HAW250755*VMI1-9900****250755*6*0', 'CB1*021545*DeBright*7.030.16*3.02*250755', 'PA1*0*100', 'PA1*1*60', 'PA2*2769*166140*210*12600*0*0*0*0'],
    '222222': ['DXS*HAWA776A0A*VA*V0/6*1', 'ST*001*0001', 'ID1*HAW250755*VMI1-9900****250755*6*0', 'CB1*021545*DeBright*7.030.16*3.02*250755', 'PA1*0*100', 'PA1*1*60', 'PA2*2769*166140*210*12600*0*0*0*0']
}

相关问题 更多 >