根据条件解析文本数据并对齐列

2024-05-20 10:26:27 发布

您现在位置:Python中文网/ 问答频道 /正文

我有下面的文本数据,我需要根据下面的条件将其解析并拆分为列

  1. 任何以=开头的内容都应归入ENC_NAME

  2. 任何在行尾包含BladeSystem数字的行都应位于OA_VERSION列下

  3. 任何包含1 HP的行都应位于VC_ACTIVE列下

  4. 任何包含2 HP的行都应位于VC_STDN列下

文本数据

========= enc1001 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc1002 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc1003 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc1004 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc1005 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc1006 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc1007 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc1008 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.40
  2 HP VC Flex-10/10D Module   4.40
========= enc1009 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc2001 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc2002 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc2003 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc2004 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc2005 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc2006 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc2007 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc2008 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc2009 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc2011 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc2013 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc3020 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.41
  2 HP VC Flex-10/10D Module   4.41
========= enc3021 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.41
  2 HP VC Flex-10/10D Module   4.41
========= enc3022 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.41
  2 HP VC Flex-10/10D Module   4.41
========= enc3026 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.45
  2 HP VC Flex-10/10D Module   4.45
========= enc3027 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc3028 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc3029 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc3030 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc3031 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc4021 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.41
  2 HP VC Flex-10/10D Module   4.41
========= enc4023 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.41
  2 HP VC Flex-10/10D Module   4.41
========= enc4024 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.41
  2 HP VC Flex-10/10D Module   4.41
========= enc4025 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.41
  2 HP VC Flex-10/10D Module   4.41
========= enc4026 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc4027 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc4028 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc4029 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc4030 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc4031 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc4032 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc4033 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc4034 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc6002 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.60
========= enc6011 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.60
========= enc6012 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.60
========= enc6013 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.60
========= enc6014 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.60
========= enc6015 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.60
========= enc6016 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.60
========= enc6017 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.60
========= enc7002 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
========= enc7003 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
========= enc7004 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
========= enc7009 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc1010 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc1011 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc1012 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc1013 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc1014 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc1015 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc1016 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc1017 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc1018 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc1025 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.62
  2 HP VC Flex-10/10D Module   4.62
========= enc1026 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc2010 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc2012 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc2014 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc2015 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc2016 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc2018 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc2019 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc2020 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc2021 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc2022 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc2023 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc3033 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc3034 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc3036 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc4020 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.41
  2 HP VC Flex-10/10D Module   4.41
========= enc4022 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.41
  2 HP VC Flex-10/10D Module   4.41
========= enc4035 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc7005 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc7006 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC FlexFabric 10Gb/24-Port Module  4.50
  2 HP VC FlexFabric 10Gb/24-Port Module  4.50
========= enc7007 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.62
  2 HP VC Flex-10/10D Module   4.62
========= enc7008 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.62
  2 HP VC Flex-10/10D Module   4.62
========= enc8001 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc8017 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc8018 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc8019 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc8021 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc8022 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.62
  2 HP VC Flex-10/10D Module   4.62
========= enc8023 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.62
  2 HP VC Flex-10/10D Module   4.62
========= enc8024 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.62
  2 HP VC Flex-10/10D Module   4.62
========= enc8025 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.62
  2 HP VC Flex-10/10D Module   4.62
========= enc8026 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.62
  2 HP VC Flex-10/10D Module   4.62
========= enc8027 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.62
  2 HP VC Flex-10/10D Module   4.62
========= enc8028 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.62
  2 HP VC Flex-10/10D Module   4.62
========= enc8033 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.40
  2 HP VC Flex-10/10D Module   4.40

期望输出(示例):

ENC_NAME    OA_VERSION      VC_ACTIVE   VC_STDN
enc4031     4.85            4.50        4.50
enc4032     4.85            4.50        4.50
enc4033     4.85            4.50        4.50
enc4034     4.85            4.50        4.50
enc6002     4.60            NaN         NaN
enc6011     4.60            NaN         NaN
enc6012     4.60            NaN         NaN
enc6013     4.60            NaN         NaN

编辑(我试过的)

df  = pd.read_csv("enc_list_sorted", names=["col1"])
df = df.col1.str.split(' ', expand = True)
df = df.drop(df.columns[[0, 2, 3, 4, 5, 6, 7, 8, 11]], axis=1)


df = df.rename(columns={ 1: 'ENC_NAME', 9: 'VC_VERSION', 10: 'OA_VERSION'})

print(df)

        ENC_NAME VC_VERSION OA_VERSION
    0    enc1001       None       None
    1                   KVM       4.85
    2                  4.50       None
    3                  4.50       None
    4    enc1002       None       None
    5                   KVM       4.85
    6                  4.50       None
    7                  4.50       None
    8    enc1003       None       None
    9                   KVM       4.85
    10                 4.50       None
    11                 4.50       None
    12   enc1004       None       None
    13                  KVM       4.85
    14                 4.50       None
    15                 4.50       None

任何帮助或想法都会非常有用


Tags: nonedfversionkvmwithnanhpflex
3条回答

正如这里的注释所建议的,用pandas打开文件,解析并不理想

假设您的数据保存在文本文件file.txt

import pandas as pd

with open("file.txt") as file:
    lines = [l.rstrip("\n") for l in file]


row_temp = [None] * 4
row = None
out = []
for line in lines:
    if line.startswith("="):
        if row is not None:
            out.append(row)
        row = row_temp.copy()
        row[0] = line.replace("=", "").rstrip().lstrip()

    if 'BladeSystem' in line:
        row[1] = line.split(" ")[-1]
    if '1 HP' in line:
        row[2] = line.split(" ")[-1]
    if '2 HP' in line:
        row[3] = line.split(" ")[-1]

col_names = ["ENC_NAME", "OA_VERSION", "VC_ACTIVE", "VC_STDN"]
df = pd.DataFrame(out,
                  columns=col_names)

返回要查找的输出

在我看来,应该使用一个自编解析器。您所拥有的可以看作是所谓DSL的一种形式,一种特定于领域的语言。这里使用的语法相当宽容:

import re, pandas as pd
from parsimonious.grammar import Grammar
from parsimonious.nodes import NodeVisitor

class ENCVisitor(NodeVisitor):
    grammar = Grammar(r"""
            content     = (ws / block)*

            block       = header oa_line vc_active? vc_stdn?
            header      = delim ws word ws delim nl

            oa_line     = ~"^(?=.*BladeSystem).+"m nl?
            vc_active   = ~"^(?=.*1 HP).+"m nl?
            vc_stdn     = ~"^(?=.*2 HP).+"m nl?

            word        = ~"\w+"
            delim       = ~"=+"
            ws          = ~"\s+"
            nl          = ~"[\n\r]+"
    """)

    version_pattern = re.compile(r"\d+\.\d+$")

    def get_version(self, key, line):
        match = self.version_pattern.search(line)
        value = match.group(0) if match else None
        return {key: value}

    def generic_visit(self, node, visited_children):
        return visited_children or node

    def visit_header(self, node, visited_children):
        header = visited_children[2]
        return {"ENC_NAME": header.text}

    def visit_oa_line(self, node, visited_children):
        line, _ = visited_children
        return self.get_version("OA_VERSION", line.text)

    def visit_vc_active(self, node, visited_children):
        line, _ = visited_children
        return self.get_version("VC_ACTIVE", line.text)

    def visit_vc_stdn(self, node, visited_children):
        line, _ = visited_children
        return self.get_version("VC_STDN", line.text)

    def visit_block(self, node, visited_children):
        dct = {}
        for child in visited_children:
            if isinstance(child, dict):
                dct.update(child)
            elif isinstance(child, list):
                dct.update(child[0])
        return dct

    def visit_content(self, node, visited_children):
        return [child[0] for child in visited_children if isinstance(child[0], dict)]

enc = ENCVisitor()
result = enc.parse(data)

df = pd.DataFrame(result)
print(df)

对于您提供的数据,这将导致

   ENC_NAME OA_VERSION VC_ACTIVE VC_STDN
0   enc1001       4.85      4.50    4.50
1   enc1002       4.85      4.50    4.50
2   enc1003       4.85      4.50    4.50
3   enc1004       4.85      4.50    4.50
4   enc1005       4.85      4.50    4.50
..      ...        ...       ...     ...
94  enc8025       4.85      4.62    4.62
95  enc8026       4.85      4.62    4.62
96  enc8027       4.85      4.62    4.62
97  enc8028       4.85      4.62    4.62
98  enc8033       4.85      4.40    4.40

[99 rows x 4 columns]

解释:您的输入可以被看作是一种自己的小型语言,即所谓的领域特定语言。文件中的每个信息块由一个标题行、一个OA_VERSION行和两个可能存在或不存在的行(VC_ACTIVEVC_STDN)组成。标题行始终以===开头和结尾

所有这些块都构成了一个语法,即文件/字符串中的空白或多个块。在内部,我们建立了一个抽象的syntrax树(ast),为了检索信息,我们需要“访问”每个节点。在我选择使用的解析器库(优秀的parsimonious)中,这是通过NodeVisitor类完成的,ast的每个叶都通过相应的函数名访问。这意味着如果我们将一个部分称为“header”,则函数应命名为“visit_header”

结果通过“visit_块”获取,是该块所有检索信息的字典。最后,所有的东西都被送入pandas

当然,这只能是一个简短的介绍,如果您想了解更多关于parsimonious,请看一下Github repository

你可以试试这个:

import pandas as pd
import re
import numpy as np

with open(r'test1.txt','r') as file:
    txto=file.read()

data=[]
pattern1 = re.compile('(^\=.+)\s.+$\n?', re.MULTILINE)
lstlines=txto.split('\n')

for ele1, ele2 in zip(re.findall(pattern1,txto),re.findall(pattern1,txto)[1:]):
    row=lstlines[lstlines.index(ele1):lstlines.index(ele2)]

    OA_VERSION=[i for i in row if 'BladeSystem' in i]
    OA_VERSION=OA_VERSION[0].split()[-1] if len(OA_VERSION)>0 else np.nan
    
    VC_ACTIVE=[i for i in row if '1 HP' in i]
    VC_ACTIVE=VC_ACTIVE[0].split()[-1] if len(VC_ACTIVE)>0 else np.nan
    
    VC_STDN=[i for i in row if '2 HP' in i]
    VC_STDN=VC_STDN[0].split()[-1] if len(VC_STDN)>0 else np.nan
    
    data.append([ele1.replace('=','').strip(),OA_VERSION, VC_ACTIVE,VC_STDN])
    
#last row 
row=lstlines[lstlines.index(re.findall(pattern1,txto)[-1]):]
OA_VERSION=[i for i in row if 'BladeSystem' in i]
OA_VERSION=OA_VERSION[0].split()[-1] if len(OA_VERSION)>0 else np.nan
VC_ACTIVE=[i for i in row if '1 HP' in i]
VC_ACTIVE=VC_ACTIVE[0].split()[-1] if len(VC_ACTIVE)>0 else np.nan
VC_STDN=[i for i in row if '2 HP' in i]
VC_STDN=VC_STDN[0].split()[-1] if len(VC_STDN)>0 else np.nan
data.append([re.findall(pattern1,txto)[-1].replace('=','').strip(),OA_VERSION, VC_ACTIVE,VC_STDN]) 

#Create dataframe
df=pd.DataFrame(data, columns=['ENC_NAME ','OA_VERSION','VC_ACTIVE','VC_STDN'])
print(df)

输出:

df
   ENC_NAME  OA_VERSION VC_ACTIVE VC_STDN
0    enc1001       4.85      4.50    4.50
1    enc1002       4.85      4.50    4.50
2    enc1003       4.85      4.50    4.50
3    enc1004       4.85      4.50    4.50
4    enc1005       4.85      4.50    4.50
..       ...        ...       ...     ...
94   enc8025       4.85      4.62    4.62
95   enc8026       4.85      4.62    4.62
96   enc8027       4.85      4.62    4.62
97   enc8028       4.85      4.62    4.62
98   enc8033       4.85      4.40    4.40

[99 rows x 4 columns]

相关问题 更多 >