在输入fi中读取扁平dict的pythonic方法

2024-09-28 19:06:08 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在读取一个基本上包含嵌套扁平字典的输入文件,如下所示:

&HEADER
header line 0
header line 1
&End_HEADER
some other line 1
some other line 2
#XXX0
line 0
line 1
line 2
#AAA0
line 0
line 1
line 2
...
#AAA1
line 0
line 1
...
#BBB0
line 0
line 1
...
#BBB1
line 0
...
#XXX1
line 0
...
#AAA0
line 0
...
#AAA1
...
#BBB0
...

您可以注意到有三个部分,每个部分由几行组成:标题、一些单独的行和一个扁平的嵌套字典。我对后者最感兴趣,但理想情况下,我希望对其进行解析,以便得到以下结果:

{
    'HEADER': ['header line 0', 'header line 1'],
    'other lines': ['some other line 1', 'some other line 2'],
    'XXX0': {
        'HEADER': ['line 0', 'line 1', 'line 2'],
        'AAA0': ['line 0', 'line 1', 'line 2', '...'],
        'AAA1': ['line 0', 'line 1', '...'],
        'BBB0': ['line 0', 'line 1', '...'],
        'BBB1': ['line 0', '...']
    },
    'XXX1': {
        'HEADER': ['line 0', '...'],
        'AAA0': ['line 0', '...'],
        'AAA1': ['...'],
        'BBB0': ['...']
    }
}

我目前正在遍历每一行,并使用if语句将每一行附加到嵌套字典中。它是有效的,但它是丑陋的,我怀疑这可以做得更优雅,也许使用递归,defaultdict或正则表达式。我的脑袋绕不过去。你能帮我找到更好的方法吗?非常感谢


Tags: 文件字典linesomeendheaderother扁平
1条回答
网友
1楼 · 发布于 2024-09-28 19:06:08

这就是我的想法Pythonic是相当主观的,但这种方法可以使它容易遵循?这不包括顶部的头部分,但也可以很容易地解析。一旦你解析了它,你就可以把字典合并在一起

import re
import json
import itertools

a = """
&HEADER
header line 0
header line 1
&End_HEADER
some other line 1
some other line 2
#XXX0
line 0
line 1
line 2
#AAA0
line 0
line 1
line 2
...
#AAA1
line 0
line 1
...
#BBB0
line 0
line 1
...
#BBB1
line 0
...
#XXX1
line 0
...
#AAA0
line 0
...
#AAA1
...
#BBB0
...
"""

# split the text into chunks that should be a key to the dictionary
splits = re.finditer(r"(#XXX[0-9])", a)
# this is a list of tuples. each tuple tells you where in a '#XXX0', '#XXX1' etc
# occur. These will be the keys in the dictionary.
spans = [group.span() for group in splits]

# we need to get text from were a group begins and where the other begins etc
blocks = list(itertools.chain(*spans)) + [len(a)]

# we just want to take 2 indices at a time i.e. where the text for
# '#XXX0' begins and where '#XXX1' starts.
blocks_gen = iter(blocks[1:])
blocks = [tuple(next(blocks_gen) for _ in range(2)) for __ in range(len(blocks[1:]) // 2)]

def block_parse(block: tuple):
    """
    A block is just a tuple. say (0, 10) for example. This should indicate that
    the substring of the flatten text from 0 to 10 is a block of text that needs
    to be processed and assigned to a dictionary.
    """
    block_string = a[block[0]: block[1]]
    header, body = block_string.split("#", 1)

    header = {'HEADER': header.strip().split('\n')}

    body = body.split('#')
    body = [body_.strip().split('\n') for body_ in body]
    body = dict((body_[0], body_[1:]) for body_ in body)

    return {**header, **body}

assert len(spans) == len(blocks)

my_dict = {a[spans[i][0]:spans[i][1]]: block_parse(blocks[i]) for i in range(len(blocks))}

print(json.dumps(my_dict, indent=4))

输出:

{
    "#XXX0": {
        "HEADER": [
            "line 0",
            "line 1",
            "line 2"
        ],
        "AAA0": [
            "line 0",
            "line 1",
            "line 2",
            "..."
        ],
        "AAA1": [
            "line 0",
            "line 1",
            "..."
        ],
        "BBB0": [
            "line 0",
            "line 1",
            "..."
        ],
        "BBB1": [
            "line 0",
            "..."
        ]
    },
    "#XXX1": {
        "HEADER": [
            "line 0",
            "..."
        ],
        "AAA0": [
            "line 0",
            "..."
        ],
        "AAA1": [
            "..."
        ],
        "BBB0": [
            "..."
        ]
    }
}

相关问题 更多 >