用python检查方括号内的文本段

2条回答

网友

1楼 · 编辑于 2024-09-22 10:30:39

不是正则表达式的解决方案…但会做的工作！你知道吗

def SearchStuff(lines,sstr):
    i=0
    while(lines[i]!='}'):
        #Do stuffff .....for e.g.
        if 'Ca' in lines[i]:
            return lines[i]
        i+=1

def main(search_str):
    f=open('file.txt','r')
    lines = f.readlines()
    f.close()
    for line in lines:
        if search_str in line:
            index = lines.index(line)
            break
    lines = lines[index+1:]
    print SearchStuff(lines,search_str) 



search_str = 'segmentC'   #set this string accordingly
main(search_str)

网友

2楼 · 编辑于 2024-09-22 10:30:39

根据您所寻找的复杂性，您可以从具有基于行的模式搜索的简单状态机到完整的lexer。你知道吗

基于行的搜索

下面的示例假设您只查找一个段，并且segmentC {和结束}在一行上。你知道吗

def parsesegment(fh):
    # Yields all lines inside "segmentC"
    state = "out"
    for line in fh:
        line = line.strip() # in case there are whitespaces around
        if state == "out":
            if line.startswith("segmentC {"):
                state = "in"
                break
        elif state == "in":
            if line.startswith("}"):
                state = "out"
                break
            # Work on the specific lines here
            yield line 

with open(...) as fh:
    for line in parsesegment(fh):
        # do something

简单Lexer

如果需要更大的灵活性，可以设计一个简单的lexer/parser耦合。例如，下面的代码不假设行之间语法的组织。它也会忽略未知模式，而典型的lexer则不会（通常会引发语法错误）：

import re

class ParseSegment:
    # Dictionary of patterns per state
    # Tuples are (token name, pattern, state change command)
    _regexes = {
        "out": [
            ("open", re.compile(r"segment(?P<segment>\w+)\s+\{"), "in")
        ],
        "in": [
            ("close", re.compile(r"\}"), "out"),
            # Here an example of what you could want to match
            ("content", re.compile(r"content\s+(?P<content>\w+)"), None)
        ]
    }

    def lex(self, source, initpos = 0):
        pos = initpos
        end = len(source)
        state = "out"
        while pos < end:
            for token_name, reg, state_chng in self._regexes[state]:
                # Try to get a match
                match = reg.match(source, pos)
                if match:
                    # Advance according to how much was matched
                    pos = match.end()
                    # yield a token if it has a name
                    if token_name is not None:
                        # Yield token name, the full matched part of source
                        # and the match grouped according to (?P<tag>) tags
                        yield (token_name, match.group(), match.groupdict())
                    # Switch state if requested
                    if state_chng is not None:
                        state = state_chng
                    break
            else:
                # No match, advance by one character
                # This is particular to that lexer, usually no match means
                # the input file has an error in the syntax and lexer should
                # yield an exception
                pos += 1

    def parse(self, source, initpos = 0):
        # This is an example of use of the lexer with a parser
        # This converts the input file into a dictionary. Keys are segment
        # names, and values are list of contents.
        segments = {}
        cur_segment = None
        # Use lexer to get tokens from source
        for token, fullmatch, groups in self.lex(source, initpos):
            # On open, create the list of content in segments
            if token == "open":
                cur_segment = groups["segment"]
                segments[cur_segment] = []
            # On content, ensure we know the segment and add content to the
            # list
            elif token == "content":
                if cur_segment is None:
                    raise RuntimeError("Content found outside a segment")
                segments[cur_segment].append(groups["content"])
            # On close, set the current segment to unknown
            elif token == "close":
                cur_segment = None
            # ignore unknown tokens, we could raise an error instead
        return segments

def main():
    with open("...", "r") as fh:
        data = fh.read()
        lexer = ParseSegment()
        segments = lexer.parse(data)
        print(segments)
    return 0

if __name__ == '__main__':
    main()

全Lexer

如果您需要更大的灵活性和可重用性，就必须创建一个完整的解析器。不需要重新发明轮子，看看this list of language parsing modules，你可能会找到一个适合你的。你知道吗

基于行的搜索

简单Lexer

全Lexer

相关问题更多 >

编程相关推荐

热门问题

热门文章

用python检查方括号内的文本段

基于行的搜索

简单Lexer

全Lexer

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >