将日志文件解析为嵌套的开始和结束对的算法/Python

[ {'keyword':'a', 'is_pair':True, 'type':'open', 'details':'iwiv', 'linenumber':5}, {'keyword':'a', 'is_pair':True, 'type':'open', 'details':'83fi', 'linenumber':200}, {'keyword':'a', 'is_pair':True, 'type':'open', 'details':'28c8', 'linenumber':360}, {'keyword':'a', 'is_pair':True, 'type':'close', 'details':'28c8', 'linenumber':365}, {'keyword':'a', 'is_pair':True, 'type':'open', 'details':'28c8', 'linenumber':370}, {'keyword':'a', 'is_pair':True, 'type':'close', 'details':'28c8', 'linenumber':375}, {'keyword':'a', 'is_pair':True, 'type':'open', 'details':'aowq', 'linenumber':400}, {'keyword':'b', 'is_pair':True, 'type':'open', 'details':'pwiv', 'linenumber':520}, {'keyword':'b', 'is_pair':True, 'type':'close', 'details':'pwiv', 'linenumber':528}, {'keyword':'d', 'is_pair':False, 'details':'9393', 'linenumber':600}, {'keyword':'b', 'is_pair':True, 'type':'open', 'details':'viao', 'linenumber':740}, {'keyword':'b', 'is_pair':True, 'type':'close', 'details':'viao', 'linenumber':741}, {'keyword':'b', 'is_pair':True, 'type':'open', 'details':'viao', 'linenumber':750}, {'keyword':'b', 'is_pair':True, 'type':'close', 'details':'viao', 'linenumber':777}, {'keyword':'a', 'is_pair':True, 'type':'close', 'details':'aowq', 'linenumber':822}, {'keyword':'a', 'is_pair':True, 'type':'close', 'details':'83fi', 'linenumber':850}, {'keyword':'a', 'is_pair':True, 'details':'iwiv', 'linenumber':990}, {'keyword':'c', 'is_pair':False, 'details':'1212', 'linenumber':997}, ]

<a start="5" end="990"> iwiv <a start="200" end="850"> 83fi <a start="360" end="365"> 28c8 </a> <a start="370" end="375"> 28c8 </a> <a start="400" end="822"> aowq pwiv <d linenumber="600"> 9393 </d> viao viao </a> </a> </a> <c linenumber="997"> 1212 </c>

<a start="5" end="990" details="iwiv"> <a start="200" end="850" details="83fi"> <a start="360" end="365" details="28c8"/> <a start="370" end="375" details="28c8"/> <a start="400" end="822" details="aowq"> <d linenumber="600" details="9393"/> </a> </a> </a> <c linenumber="997" details="1212"/>

[ { 'keyword':'a', 'start':5, 'end':990, 'details':'iwiv', 'inner':[ { 'keyword':'a', 'start':200, 'end':850, 'details':'83fi', 'inner':[ {'keyword':'a', 'details':'28c8'}, {'keyword':'a', 'details':'28c8'}, { 'keyword':'a', 'start':400, 'end':822, 'details':'aowq', 'inner':[ {'keyword':'b', 'start':520, 'end':528, 'details':'pwiv'}, {'keyword':'d', 'linenumber':600, 'details':'9393'}, {'keyword':'b', 'start':740, 'end':741, 'details':'viao'}, {'keyword':'b', 'start':750, 'end':777, 'details':'viao'} ] } ] } ] }, {'keyword':'c', 'linenumber':997, 'details':'1212'} ]

2条回答

网友

1楼 · 编辑于 2024-09-30 03:26:15

如果您的数据要按行号排序，最好的方法是使用堆栈。它还可以帮助您将其转换为所需的嵌套格式。你知道吗

通过重用您的数据，我们可以：

data = \
[
    {'keyword':'a', 'is_pair':True, 'type':'open', 'details':'iwiv', 'linenumber':5},
    {'keyword':'a', 'is_pair':True, 'type':'open', 'details':'83fi', 'linenumber':200},
    {'keyword':'a', 'is_pair':True, 'type':'open', 'details':'28c8', 'linenumber':360},
    {'keyword':'a', 'is_pair':True, 'type':'close', 'details':'28c8', 'linenumber':365},
    {'keyword':'a', 'is_pair':True, 'type':'open', 'details':'28c8', 'linenumber':370},
    {'keyword':'a', 'is_pair':True, 'type':'close', 'details':'28c8', 'linenumber':375},
    {'keyword':'a', 'is_pair':True, 'type':'open', 'details':'aowq', 'linenumber':400},
    {'keyword':'b', 'is_pair':True, 'type':'open', 'details':'pwiv', 'linenumber':520},
    {'keyword':'b', 'is_pair':True, 'type':'close', 'details':'pwiv', 'linenumber':528},
    {'keyword':'d', 'is_pair':False, 'details':'9393', 'linenumber':600},
    {'keyword':'b', 'is_pair':True, 'type':'open', 'details':'viao', 'linenumber':740},
    {'keyword':'b', 'is_pair':True, 'type':'close', 'details':'viao', 'linenumber':741},
    {'keyword':'b', 'is_pair':True, 'type':'open', 'details':'viao', 'linenumber':750},
    {'keyword':'b', 'is_pair':True, 'type':'close', 'details':'viao', 'linenumber':777},
    {'keyword':'a', 'is_pair':True, 'type':'close', 'details':'aowq', 'linenumber':822},
    {'keyword':'a', 'is_pair':True, 'type':'close', 'details':'83fi', 'linenumber':850},
    {'keyword':'a', 'is_pair':True, 'type':'close', 'details':'iwiv', 'linenumber':990}, # added 'type':'close'
    {'keyword':'c', 'is_pair':False, 'details':'1212', 'linenumber':997},
]

请注意，我在行号为990的数据上添加了一个close，否则就不会有匹配的对。如果没有结束对，您将松开第一行（您可以在末尾检查堆栈是否为空，以捕获它）。你知道吗

# The level of nesting, since we increase if we find an open
# the first open will get a depth of 0
depth = -1

# We store the complete answers and the stacked answers.
result, stack = [], []


for row in data:
    # Check if the type is open, or if the data is unpaired
    if row.get('type', None) == 'open' or not row['is_pair']:

        # We store it on the stack and increase nesting level
        stack.append(row)
        depth += 1

    # If there is no match, we close it directly.
    # Or if the type is closing
    if not row['is_pair'] or row.get('type', None) == 'close':

        # We get the last item on the stack
        matching_open = stack.pop(-1)

        # We will sort on the linenumbers to make sure that everything will be in order
        # we also store the dept for our layout (we are following example 2)
        result.append((matching_open['linenumber'], depth,
                       f'{" " * 4 * depth}<{row["keyword"]} start="{matching_open["linenumber"]}" '
                       f'end="{row["linenumber"]}" details="{row["details"]}">'))

        # Decrease nesting level
        depth -= 1

基本上，我们将通过您的数据循环，并检查是否有迹象表明有一个开放的类型。如果是这样，我们就把它附加到堆栈中。如果找到匹配的结果，我们会将其添加到找到的结果中。为了按正确的顺序打印并添加右括号，我们还需要知道嵌套的深度。对于格式，我为每个添加的级别添加了一个额外的选项卡（4个空格）。你知道吗

如果堆栈中还有剩余的东西，我们可以使用

if stack:
    raise ValueError("There is still a value in the stack, matching is not possible!")

现在，我们仍然必须以正确的顺序输出数据，因为闭包是以相反的顺序进行的，因此我们对结果按行号排序，行号是元组的第一项。我们检查是否更改了嵌套级别，如果得到更多嵌套级别，则存储关键字。在减少嵌套的情况下，我们会去掉结束符号。你知道吗

# For the closing signs we need to keep track of our depth and opening keyword
temp = []
old_depth = None

# We only need the depth and message, so we discard the linenumber
for _, depth, message in sorted(result, key= lambda x: x[0]):

    # If the old depth was larger, we dropped a depth and we
    # need to put in a closing sign </a>
    if old_depth is not None and old_depth > depth:
        for num in range(old_depth - depth):
            close_open = temp.pop(-1)
            print(f'{" "*4*(old_depth-num -1)}</{close_open}>')

    # If we jump a depth we need to store the closing sign
    if old_depth is not None and old_depth < depth:
        temp.append(message[4*depth + 1])

    # Update the depth and print the message, since we append everything
    old_depth = depth
    print(message)

这将产生以下输出

<a start="5" end="990" details="iwiv">
    <a start="200" end="850" details="83fi">
        <a start="360" end="365" details="28c8">
        <a start="370" end="375" details="28c8">
        <a start="400" end="822" details="aowq">
            <b start="520" end="528" details="pwiv">
            <d start="600" end="600" details="9393">
            <b start="740" end="741" details="viao">
            <b start="750" end="777" details="viao">
        </a> 
    </a>
</a>
<c start="997" end="997" details="1212">

网友

2楼 · 编辑于 2024-09-30 03:26:15

我建议用一堆来解决这个问题。如果数据嵌套正确，它将很容易解决。你知道吗

但是，我会对不正确嵌套的数据进行显式错误检查。因为如果你拿错了结束标签，那就是难题所在。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章