在匹配的python正则表达式上提取数据

2024-10-02 18:18:45 发布

您现在位置:Python中文网/ 问答频道 /正文

我有以下多行字符串:

/*dummy comment */

/* comment about sum function jkhkdhfljkldjf
  kjsdkjflskj
*/

int sum(int a,int b);

/*comment about mul function */ 

int mul(int a,int b);

如果使用以下正则表达式,则会得到两个输出匹配:

regex -> (?P<desc>(\/\*[\s\S]+?\*\/$))(?P<fun>\s*int\s*\b\w+\b\s*\(\w+\s+.+\s*(?:;$))

匹配#1:

desc:

/*dummy comment */ 

/* commect about sum function  jkhkdhfljkldjf
  kjsdkjflskj*/

fun:

int sum(int a,int b);

匹配#2:

desc:

/* comment about mul function */

fun:

 int mul(int a,int b);

对于match#1,我得到两个注释,但我只需要最后一个注释,即,/*关于和函数jkhkdhfljkldjf kjsdkjflskj的注释*/我不想与/*伪注释*/匹配

请帮助我获得以下输出

匹配#1:

desc:

/* commect about sum function jkhkdhfljkldjf
  kjsdkjflskj*/

fun:

int sum(int a,int b);

匹配#2:

desc:

/* comment about mul function */

fun:

 int mul(int a,int b);

Tags: 字符串matchcommentfunctiondescregexdummyint
1条回答
网友
1楼 · 发布于 2024-10-02 18:18:45

我无法调试您的regex,因为它在示例中的格式似乎不正确,所以这里有一个关于如何调试的工作片段。当注释解释正则表达式是如何工作的时,请仔细检查它们。你知道吗

import re

# sample text as in the question
sample_str = """/*dummy comment */

/* comment about sum function */

int sum(int a,int b);

/*comment about mul function */ 

int mul(int a,int b);"""

# Match the regex below and capture its match into a backreference named “desc” (also backreference number 1) «(?P<desc>/\*\s*comment about (?P<func_name>[^\s]+?) function \*/\s*\r*\n*)»
#    Match the character “/” literally «/»
#    Match the character “*” literally «\*»
#    Match a single character that is a “whitespace character” (any Unicode separator, tab, line feed, carriage return, vertical tab, form feed, next line) «\s*»
#       Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
#    Match the character string “comment about ” literally (case sensitive) «comment about »
#    Match the regex below and capture its match into a backreference named “func_name” (also backreference number 2) «(?P<func_name>[^\s]+?)»
#       Match a single character that is NOT a “whitespace character” (any Unicode separator, tab, line feed, carriage return, vertical tab, form feed, next line) «[^\s]+?»
#          Between one and unlimited times, as few times as possible, expanding as needed (lazy) «+?»
#    Match the character string “ function ” literally (case sensitive) « function »
#    Match the character “*” literally «\*»
#    Match the character “/” literally «/»
#    Match a single character that is a “whitespace character” (any Unicode separator, tab, line feed, carriage return, vertical tab, form feed, next line) «\s*»
#       Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
#    Match the carriage return character «\r*»
#       Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
#    Match the line feed character «\n*»
#       Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
# Match the regex below and capture its match into a backreference named “fun” (also backreference number 3) «(?P<fun>(?P<return_type>[^\s]+?) (?P<func_name_2>[^\s]+?)\((?P<arguments>[^\)]+?)\))»
#    Match the regex below and capture its match into a backreference named “return_type” (also backreference number 4) «(?P<return_type>[^\s]+?)»
#       Match a single character that is NOT a “whitespace character” (any Unicode separator, tab, line feed, carriage return, vertical tab, form feed, next line) «[^\s]+?»
#          Between one and unlimited times, as few times as possible, expanding as needed (lazy) «+?»
#    Match the character “ ” literally « »
#    Match the regex below and capture its match into a backreference named “func_name_2” (also backreference number 5) «(?P<func_name_2>[^\s]+?)»
#       Match a single character that is NOT a “whitespace character” (any Unicode separator, tab, line feed, carriage return, vertical tab, form feed, next line) «[^\s]+?»
#          Between one and unlimited times, as few times as possible, expanding as needed (lazy) «+?»
#    Match the opening parenthesis character «\(»
#    Match the regex below and capture its match into a backreference named “arguments” (also backreference number 6) «(?P<arguments>[^\)]+?)»
#       Match any character that is NOT the closing parenthesis character «[^\)]+?»
#          Between one and unlimited times, as few times as possible, expanding as needed (lazy) «+?»
#    Match the closing parenthesis character «\)»

function_re = re.compile(r"(?P<desc>/\*\s*comment about (?P<func_name>[^\s]+?) function \*/)\s*\r*\n*(?P<fun>(?P<return_type>[^\s]+?) (?P<func_name_2>[^\s]+?)\((?P<arguments>[^\)]+?)\))")

for function_match in function_re.finditer(sample_str):
    # match start: function_match.start()
    # match end (exclusive): function_match.end()
    # matched text: function_match.group()
    print("\ndesc:\n\n{}\n".format(function_match.group("desc")))
    print("fun:\n\n{}\n\n    ".format(function_match.group("fun")))
    # Additional groups if you need them
    # print("Func Name 1: {}".format(function_match.group("func_name")))
    # print("Func Name 2: {}".format(function_match.group("func_name_2")))
    # print("Arguments  : {}".format(function_match.group("arguments")))

我得到的结果是:

desc:

/* comment about sum function */

fun:

int sum(int a,int b)

    

desc:

/*comment about mul function */

fun:

int mul(int a,int b)

    

相关问题 更多 >