正则表达式来提取括号和方括号中的嵌套模式

post_script_word_str = '(LEFT-WALL)(who)(is.v)(Obama)(,)(I.p)(love.v)(his)(speech.s)(RIGHT-WALL)' post_script_word_list = re.compile(r'$([^$$]*)$').split(post_script_word_str) print post_script_word_list post_script_link_str = '[0 12 4 (RW)][0 7 3 (Xx)][0 1 0 (Wd)][1 2 0 (Ss)][2 6 2 (Ost)][3 6 1 (Ds)][3 4 0 (La)][5 6 0 (AN)][7 8 0 (Wq)][8 9 0 (EAh)][9 10 0 (AF)][10 11 0 (SIs)]' post_script_link_str = re.compile(r'\[([^\]\[]*)\]').split(post_script_link_str) print post_script_link_str

['', 'LEFT-WALL', '', 'who', '', 'is.v', '(Ob', 'am', 'a)', ',', '', 'I.p', '', 'love.v', '', 'his', '', 'speech.s', '', 'RIGHT-WALL', ''] ['[0 ', '1', '2 4 (RW)]', '0 7 3 (Xx)', '', '0 1 0 (Wd)', '', '1 2 0 (Ss)', '', '2 6 2 (Ost)', '', '3 6 1 (Ds)', '', '3 4 0 (La)', '', '5 6 0 (AN)', '', '7 8 0 (Wq)', '', '8 9 0 (EAh)', '', '9 10 0 (AF)', '', '10 11 0 (SIs)', '']

['[0 ', '1', '2 4 (RW)]', '0 7 3 (Xx)', '', '0 1 0 (Wd)', '', '1 2 0 (Ss)', '', '2 6 2 (Ost)', '', '3 6 1 (Ds)', '', '3 4 0 (La)', '', '5 6 0 (AN)', '', '7 8 0 (Wq)', '', '8 9 0 (EAh)', '', '9 10 0 (AF)', '', '10 11 0 (SIs)', '']

1条回答

网友

1楼 · 发布于 2024-09-26 18:19:59

re模块无法处理嵌套结构。您需要使用具有递归特性的new regex module。另外，我认为findall方法更适合此项工作：

regex.findall(r'\[([^][]*+(?:(?R)[^][]*)*+)]', post_script_link_str)

您将获得：

^{pr2}$

现在您只需要映射该列表以删除方括号。在

图案细节：

(?R)允许递归，因为它是整个模式的别名。在

*+是所有格量词。它与*相同，但不允许regex引擎回溯。这里使用它来防止灾难性的回溯，如果不幸的是支架不平衡。在

相关问题更多 >

编程相关推荐

热门问题

热门文章