正则表达式:搜索在fi中变化的所有表达式

2024-09-29 20:27:33 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图搜索并获取一个文件(字符串)中的所有组,其中每个组都有子部分,子部分的数目可以不同 所以我需要解析字符串中的所有细分。 我试图分析的字符串是:

GROUP_DEFN_START
GROUP:REGLOG
HW_REG_OPER "trace hw register operations"
HW_REG_OPER1 "trace hw register operations"
HW_REG_OPER2 "trace hw register operations"
HW_REG_OPER3 "trace hw register operations"
# Add more structs here: <struct name><space>"[<brief description>]"
GROUP:ISRLOG
ISR_STATUS "trace hw isr status"
ISR_STATUS1 "trace hw isr status"
ISR_STATUS2 "trace hw isr status"
ISR_STATUS3 "trace hw isr status"
ISR_STATUS4 "trace hw isr status"
# Add more structs here: <struct name><space>"[<brief description>]"
GROUP:PROCLOG
PROC_STATUS "trace procedure status"
PROC_STATUS1 "trace procedure status"
PROC_STATUS2 "trace procedure status"
PROC_STATUS3 "trace procedure status"
PROC_STATUS4 "trace procedure status"
PROC_STATUS5 "trace procedure status"
# Add more structs here: <struct name><space>"[<brief description>]"
GROUP_DEFN_END
STRUCT=re.split('("(.*?)"\n',group_content[0])

我要把格鲁普斯的分部放到二维列表里 这样地。。。。。。。。。。。。。你知道吗

[[HW_REG_OPER,HW_REG_OPER1,HW_REG_OPER2,HW_REG_OPER3],[ISR_STATUS,ISR_STATUS1,ISR_STATUS2,ISR_STATUS3,ISR_STATUS4],[PROC_STATUS,PROC_STATUS1,PROC_STATUS2,PROC_STATUS3,PROC_STATUS4,PROC_STATUS5]].......

细分内容的数量可能会有所不同。你知道吗

group_content = re.findall(r'GROUP:(.*?)\n(.*?)GROUP',spec_content, re.M|re.S|re.X)
STRUCT=re.split('("(.*?)"\n',group_content[0])

我需要进一步改进才能得到细分。你知道吗


Tags: reregisterstatusgrouptraceprocregoperations
1条回答
网友
1楼 · 发布于 2024-09-29 20:27:33

您可以在re.findall中获得具有以下正则表达式的所有块:

(?m)^GROUP:.*((?:\r?\n(?!GROUP:).*)*)

参见regex demo。你知道吗

详细信息:

  • (?m)^-行首
  • GROUP:-文字子串
  • .*-任何0+字符到
  • ((?:\r?\n(?!GROUP:).*)*)-组1,其内容将返回re.findall,匹配0+个序列:
    • \r?\n-换行符(可选CR和LF)
    • (?!GROUP:)-后面不跟GROUP:文字字符序列
    • .*-除换行符以外的任何0+字符

然后,您需要提取行开头的第一个单词,后跟空格+"^{}。你知道吗

  • (?m)^-行开始
  • (\w+)-group1捕获1+个单词字符
  • \s+\"-1+空格和"。你知道吗

Python demo

import re
s = "GROUP_DEFN_START\nGROUP:REGLOG\nHW_REG_OPER \"trace hw register operations\"\nHW_REG_OPER1 \"trace hw register operations\"\nHW_REG_OPER2 \"trace hw register operations\"\nHW_REG_OPER3 \"trace hw register operations\"\n# Add more structs here: <struct name><space>\"[<brief description>]\"\nGROUP:ISRLOG\nISR_STATUS \"trace hw isr status\"\nISR_STATUS1 \"trace hw isr status\"\nISR_STATUS2 \"trace hw isr status\"\nISR_STATUS3 \"trace hw isr status\"\nISR_STATUS4 \"trace hw isr status\"\n# Add more structs here: <struct name><space>\"[<brief description>]\"\nGROUP:PROCLOG\nPROC_STATUS \"trace procedure status\"\nPROC_STATUS1 \"trace procedure status\"\nPROC_STATUS2 \"trace procedure status\"\nPROC_STATUS3 \"trace procedure status\"\nPROC_STATUS4 \"trace procedure status\"\nPROC_STATUS5 \"trace procedure status\"\n# Add more structs here: <struct name><space>\"[<brief description>]\"\nGROUP_DEFN_END"
block_regex = re.compile(r'(?m)^GROUP:.*((?:\r?\n(?!GROUP:).*)*)')
item_regex = re.compile(r'(?m)^(\w+)\s+"')
matches = block_regex.findall(s)
res = []
for m in matches:
    res.append(item_regex.findall(m))
print(res)

输出:

[
    ['HW_REG_OPER', 'HW_REG_OPER1', 'HW_REG_OPER2', 'HW_REG_OPER3'], 
    ['ISR_STATUS', 'ISR_STATUS1', 'ISR_STATUS2', 'ISR_STATUS3', 'ISR_STATUS4'],
    ['PROC_STATUS', 'PROC_STATUS1', 'PROC_STATUS2', 'PROC_STATUS3', 'PROC_STATUS4', 'PROC_STATUS5']
]

相关问题 更多 >

    热门问题