用python检查方括号内的文本段

2024-09-22 10:30:39 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个文本文件,结构如下:

segmentA {
   content Aa
   content Ab
   content Ac
    ....
}

segmentB {
   content Ba
   content Bb
   content Bc
   ......
}

segmentC {
  content Ca
  content Cb
  content Cc
  ......
}

我知道如何在整个文本文件中搜索certrain字符串,但如何定义搜索某个字符串,例如“segmentC”。我需要像reg表达式这样的东西来告诉脚本??地址:

如果文本以“segmentC{”开头,则对某个字符串执行搜索,直到出现第一个“}”。你知道吗

有人有主意吗?你知道吗

提前谢谢!你知道吗


Tags: 字符串abcontent结构accaaacb
2条回答

不是正则表达式的解决方案…但会做的工作!你知道吗

def SearchStuff(lines,sstr):
    i=0
    while(lines[i]!='}'):
        #Do stuffff .....for e.g.
        if 'Ca' in lines[i]:
            return lines[i]
        i+=1

def main(search_str):
    f=open('file.txt','r')
    lines = f.readlines()
    f.close()
    for line in lines:
        if search_str in line:
            index = lines.index(line)
            break
    lines = lines[index+1:]
    print SearchStuff(lines,search_str) 



search_str = 'segmentC'   #set this string accordingly
main(search_str)

根据您所寻找的复杂性,您可以从具有基于行的模式搜索的简单状态机到完整的lexer。你知道吗

基于行的搜索

下面的示例假设您只查找一个段,并且segmentC {和结束}在一行上。你知道吗

def parsesegment(fh):
    # Yields all lines inside "segmentC"
    state = "out"
    for line in fh:
        line = line.strip() # in case there are whitespaces around
        if state == "out":
            if line.startswith("segmentC {"):
                state = "in"
                break
        elif state == "in":
            if line.startswith("}"):
                state = "out"
                break
            # Work on the specific lines here
            yield line 

with open(...) as fh:
    for line in parsesegment(fh):
        # do something

简单Lexer

如果需要更大的灵活性,可以设计一个简单的lexer/parser耦合。例如,下面的代码不假设行之间语法的组织。它也会忽略未知模式,而典型的lexer则不会(通常会引发语法错误):

import re

class ParseSegment:
    # Dictionary of patterns per state
    # Tuples are (token name, pattern, state change command)
    _regexes = {
        "out": [
            ("open", re.compile(r"segment(?P<segment>\w+)\s+\{"), "in")
        ],
        "in": [
            ("close", re.compile(r"\}"), "out"),
            # Here an example of what you could want to match
            ("content", re.compile(r"content\s+(?P<content>\w+)"), None)
        ]
    }

    def lex(self, source, initpos = 0):
        pos = initpos
        end = len(source)
        state = "out"
        while pos < end:
            for token_name, reg, state_chng in self._regexes[state]:
                # Try to get a match
                match = reg.match(source, pos)
                if match:
                    # Advance according to how much was matched
                    pos = match.end()
                    # yield a token if it has a name
                    if token_name is not None:
                        # Yield token name, the full matched part of source
                        # and the match grouped according to (?P<tag>) tags
                        yield (token_name, match.group(), match.groupdict())
                    # Switch state if requested
                    if state_chng is not None:
                        state = state_chng
                    break
            else:
                # No match, advance by one character
                # This is particular to that lexer, usually no match means
                # the input file has an error in the syntax and lexer should
                # yield an exception
                pos += 1

    def parse(self, source, initpos = 0):
        # This is an example of use of the lexer with a parser
        # This converts the input file into a dictionary. Keys are segment
        # names, and values are list of contents.
        segments = {}
        cur_segment = None
        # Use lexer to get tokens from source
        for token, fullmatch, groups in self.lex(source, initpos):
            # On open, create the list of content in segments
            if token == "open":
                cur_segment = groups["segment"]
                segments[cur_segment] = []
            # On content, ensure we know the segment and add content to the
            # list
            elif token == "content":
                if cur_segment is None:
                    raise RuntimeError("Content found outside a segment")
                segments[cur_segment].append(groups["content"])
            # On close, set the current segment to unknown
            elif token == "close":
                cur_segment = None
            # ignore unknown tokens, we could raise an error instead
        return segments

def main():
    with open("...", "r") as fh:
        data = fh.read()
        lexer = ParseSegment()
        segments = lexer.parse(data)
        print(segments)
    return 0

if __name__ == '__main__':
    main()

全Lexer

如果您需要更大的灵活性和可重用性,就必须创建一个完整的解析器。不需要重新发明轮子,看看this list of language parsing modules,你可能会找到一个适合你的。你知道吗

相关问题 更多 >