在输入fi中读取扁平dict的pythonic方法

&HEADER header line 0 header line 1 &End_HEADER some other line 1 some other line 2 #XXX0 line 0 line 1 line 2 #AAA0 line 0 line 1 line 2 ... #AAA1 line 0 line 1 ... #BBB0 line 0 line 1 ... #BBB1 line 0 ... #XXX1 line 0 ... #AAA0 line 0 ... #AAA1 ... #BBB0 ...

{ 'HEADER': ['header line 0', 'header line 1'], 'other lines': ['some other line 1', 'some other line 2'], 'XXX0': { 'HEADER': ['line 0', 'line 1', 'line 2'], 'AAA0': ['line 0', 'line 1', 'line 2', '...'], 'AAA1': ['line 0', 'line 1', '...'], 'BBB0': ['line 0', 'line 1', '...'], 'BBB1': ['line 0', '...'] }, 'XXX1': { 'HEADER': ['line 0', '...'], 'AAA0': ['line 0', '...'], 'AAA1': ['...'], 'BBB0': ['...'] } }

1条回答

网友

1楼 · 发布于 2024-09-28 19:06:08

这就是我的想法Pythonic是相当主观的，但这种方法可以使它容易遵循？这不包括顶部的头部分，但也可以很容易地解析。一旦你解析了它，你就可以把字典合并在一起

import re
import json
import itertools

a = """
&HEADER
header line 0
header line 1
&End_HEADER
some other line 1
some other line 2
#XXX0
line 0
line 1
line 2
#AAA0
line 0
line 1
line 2
...
#AAA1
line 0
line 1
...
#BBB0
line 0
line 1
...
#BBB1
line 0
...
#XXX1
line 0
...
#AAA0
line 0
...
#AAA1
...
#BBB0
...
"""

# split the text into chunks that should be a key to the dictionary
splits = re.finditer(r"(#XXX[0-9])", a)
# this is a list of tuples. each tuple tells you where in a '#XXX0', '#XXX1' etc
# occur. These will be the keys in the dictionary.
spans = [group.span() for group in splits]

# we need to get text from were a group begins and where the other begins etc
blocks = list(itertools.chain(*spans)) + [len(a)]

# we just want to take 2 indices at a time i.e. where the text for
# '#XXX0' begins and where '#XXX1' starts.
blocks_gen = iter(blocks[1:])
blocks = [tuple(next(blocks_gen) for _ in range(2)) for __ in range(len(blocks[1:]) // 2)]

def block_parse(block: tuple):
    """
    A block is just a tuple. say (0, 10) for example. This should indicate that
    the substring of the flatten text from 0 to 10 is a block of text that needs
    to be processed and assigned to a dictionary.
    """
    block_string = a[block[0]: block[1]]
    header, body = block_string.split("#", 1)

    header = {'HEADER': header.strip().split('\n')}

    body = body.split('#')
    body = [body_.strip().split('\n') for body_ in body]
    body = dict((body_[0], body_[1:]) for body_ in body)

    return {**header, **body}

assert len(spans) == len(blocks)

my_dict = {a[spans[i][0]:spans[i][1]]: block_parse(blocks[i]) for i in range(len(blocks))}

print(json.dumps(my_dict, indent=4))

输出：

{
    "#XXX0": {
        "HEADER": [
            "line 0",
            "line 1",
            "line 2"
        ],
        "AAA0": [
            "line 0",
            "line 1",
            "line 2",
            "..."
        ],
        "AAA1": [
            "line 0",
            "line 1",
            "..."
        ],
        "BBB0": [
            "line 0",
            "line 1",
            "..."
        ],
        "BBB1": [
            "line 0",
            "..."
        ]
    },
    "#XXX1": {
        "HEADER": [
            "line 0",
            "..."
        ],
        "AAA0": [
            "line 0",
            "..."
        ],
        "AAA1": [
            "..."
        ],
        "BBB0": [
            "..."
        ]
    }
}

相关问题更多 >

编程相关推荐

热门问题

热门文章