我有一个大字符串,其形式如下: [' some text [ARG1: some inner text [1: some more text], and also [ other inner text [TAG: TAG_TYPE (0.99)]] ]', 'some more text ( some text in parentheses [2: words [ARG1: more words [ARGM-TYPE: even more nested words]]] [other text in square brackets []])']

我希望捕获单引号中的所有内容,我可以通过简单的 ('(.*?)')


我可以自己捕获一些子组,例如: (\[ONTOTYPE: PERSON \((0\.(\d{1,4})\))\])

但我似乎错过了一些处理可选嵌套的基本见解。 如果我缺少一些概念,我欢迎链接到任何好的解释



特别是,match 1中的组15有一个子字符串,该子字符串与其中一个表达式匹配,但没有进一步解析它

预计到达时间: 以下是一些输入和预期输出的示例: 让我们使用regex101页面中的一个:' [ ARG0 : Those ] [ R - ARG0 : who ] [ V : graduated ] [ ARG1 : from [0: the school ] ] were promoted from provincial secretary to titular adviser . '-->{ARG0:Those, R-ARG0: who, V:graduated, ARG1:from, 0: the school, <rest of text>}。我已经将匹配1转换为dict,键值对是groups1,而不是按匹配顺序

让我们使用列表开头的较大示例,并使用编号的捕获组: [' 2 Architects : [ V : Stasov ] [ ARG1 : V. P. Melnikov ] [ ARG1 : A. ] [ ARGM - LOC : [2: I. Suzor P. Yu [ONTOTYPE: PERSON (0.851)] ] ] . Year of construction : [1: 1835 ] , 1895 - 1910 [ONTOTYPE: DATE (0.8774)] Style : Classicism [0: School of Law ] Classicism on [3: the Fontanka River [ONTOTYPE: WORK_OF_ART (0.8261)] ] , [4: 6 - Tchaikovsky Street ] , [5: 1 - Oruzhenik Fedorov Street ] , 2 - A. A. Rzhevsky House 1790 [ONTOTYPE: DATE (0.7046)] - [0: School of Law ][1: 1835 ] - arch . Stasov Vasily Petrovich [ONTOTYPE: PERSON (0.4863)] , arch . Melnikov Avraham Ivanovich [ONTOTYPE: PERSON (0.7781)] ( ? ) ' 变成

group 1: 2 Architects
group 2: V
group 3: Stasov
group 4: ARG1
group 5: V.P. Melnikov
group 6: ARG1
group 7: A
group 8: ARGM-LOC
group 9: 2
group 10: I.   Suzor   P.   Yu
group 11: ONTOTYPE: PERSON (0.851)
group 12: Year of construction
group 13: 1
group 14: 1835


更新:我现在构建了第二个版本的正则表达式:https://regex101.com/r/bzSCD0/2/ 我们希望首先捕获所有简单的组(这种情况经常发生),然后使用backreference尝试在其他组中选择性地捕获它们。 仍然不知道如何将所有这些应用于单引号之间的字符串(即('(.*?)')组)


import re
['  2  Architects  :  [  V  :  Stasov  ]  [  ARG1  :  V.  P.  Melnikov  ]  [  ARG1  :  A.  ]  [  ARGM  -  LOC  : [2:  I.   Suzor   P.   Yu [ONTOTYPE: PERSON (0.851)] ] ]  .  Year  of  construction  : [1:  1835  ] ,  1895  -  1910 [ONTOTYPE: DATE (0.8774)]  Style  :  Classicism [0:  School   of   Law ] Classicism  on [3:  the   Fontanka   River [ONTOTYPE: WORK_OF_ART (0.8261)] ] , [4:  6    -   Tchaikovsky   Street ] , [5:  1   -   Oruzhenik   Fedorov   Street ] ,  2  -  A.  A.  Rzhevsky  House  1790 [ONTOTYPE: DATE (0.7046)]  - [0:  School   of   Law ][1:  1835  ] -  arch  .  Stasov  Vasily  Petrovich [ONTOTYPE: PERSON (0.4863)]  ,  arch  .  Melnikov  Avraham  Ivanovich [ONTOTYPE: PERSON (0.7781)]  (  ?  )  ', '  -  perestroika  ,  adaptation  1895 [ONTOTYPE: DATE (0.9555)]  ,  1909  -  1910 [ONTOTYPE: DATE (0.927)]  - [2:  archbishop   Suzor   Pavel   Yulievich [ONTOTYPE: PERSON (0.7866)] ] -  perestroika  (  [  ]  .  ', '  C.  ', '  [  ARGM  -  PRD  :  293   )  ( [3:  Fontanka   River [ONTOTYPE: ORG (0.595)] ] ]  [  ARGM  -  MNR  :  , [4:  6    -   Tchaikovsky   Street ] ,  1  ,  the  right  part  - [5:  Oruzheynik   Fedorov   Street [ONTOTYPE: FAC (0.7551)] ] ,  2  ]  ,  [  ARG0  :  the  left  part  )  ]  [  V  :  see  ]  [  ARGM  -  DIS  :  also  ]  [ [ ARG1 ] :  the  building  with  columns  and  corner  domes  -  Imperial  School  of  Law [ONTOTYPE: ORG (0.6317)]  ]  .  ', '  [  ARG2  :  The  house  ]  [  V  :  occupies  ]  [ [ ARG1 ] :  an  entire  block  ]  .  ', '  In  the  XYIII  century [ONTOTYPE: DATE (0.841)]  ,  there  was  a  laundromat  on  this  site  .  ', '  In  1788 [ONTOTYPE: DATE (0.9969)]  ,  [  ARG1  :  Alexey  Andreevia  Rzhevsky [ONTOTYPE: PERSON (0.7615)]  ]  ,  [  R  -  ARG1  :  who  ]  had  [  ARGM  -  TMP  :  recently  ]  [  V  :  married  ]  [  ARG2  :  Glafira  Alymova [ONTOTYPE: PERSON (0.8034)]  (  a  graduate  of  the  Smolny  Institute  )  ] [ONTOTYPE: ORG (0.7244)]  ,  bought  the  buildings  on  the  old  Palace  Spare  Court [ONTOTYPE: FAC (0.9785)]  ,  which  were  to  be  demolished  ,  and  by  1790 [ONTOTYPE: DATE (0.9945)]  he  had  built [8:  a   building   of   two    wings   connected   by   a   gate ] .  ', '  [  ARGM  -  DIS  :  In  1793 [ONTOTYPE: DATE (0.7649)]  ]  ,  [  ARG0  : [7:  the   couple ] ]  [  V  :  sold  ]  [ [ ARG1 ] : [8: [7:  their ]  house ] ]  [  ARG2  :  to  Countess  Maria  Iosifovna  Potocka [ONTOTYPE: PERSON (0.8324)]  ,  née  Mnišek [ONTOTYPE: PERSON (0.9412)]  ]  .  ']
['Arbigny [ONTOTYPE: PERSON (0.9974)]  ,  who  had  leased [1:  it ] to  the  Paget  Corps [ONTOTYPE: ORG (0.9155)]  for  eight  years [ONTOTYPE: DATE (0.9722)]  .  ``  ,  ', '  ,  ``  [ [ ARG1 ] :  The  first [ONTOTYPE: ORDINAL (0.9903)]  major  reconstruction  of [1:  the   house ] ]  [  V  :  took  ]  [  ARG2  :  place  ]  [  ARGM  -  LOC  :  in [3:  1814 ] -  1819 [ONTOTYPE: DATE (0.8943)]  ]  .  [  ARGM  -  ADV  :  In [5:  1835 ] ]  , [4:  Neplyev   `s ] heirs  sold [1:  the   house ] to [6:  Prince   Peter   Georgievich [ONTOTYPE: PERSON (0.9766)]   of   Oldenburg [ONTOTYPE: GPE (0.9829)]   ,   who   decided   to   establish   a   School   of   Law   there ] .  ``  ,  ', '  ,  ', '  ,  ', '  ,  ', '  ,  ']
['  [  ARG0  :  Those  ]  [  R  -  ARG0  :  who  ]  [  V  :  graduated  ]  [  ARG1  :  from [0:  the   school ] ]  were  promoted  from  provincial  secretary  to  titular  adviser  .  ', '  Architect  [  ARG0  :  A.I.  Melnikov [ONTOTYPE: PERSON (0.8799)]  ]  [  V  :  created  ]  [  ARG1  :  a  reconstruction  project  -  building  a  gap  between  the  buildings  ,  decorated  the  facade  with  a  portico  ,  courtyard  outbuildings  ,  and  a  house  ]  .  ', ' [1:  Church ] .  ', ' [2:  The   trustee   of  [0:  the   school ]] is  Prince  Peter  Georgievich [ONTOTYPE: PERSON (0.9901)]  of  Oldenburg [ONTOTYPE: GPE (0.9956)]  ,  a  close  relative  of  the  imperial  family  .  ', '  [  ARGM  -  ADV  :  From  1860 [ONTOTYPE: WORK_OF_ART (0.6234)]  ]  ,  [  ARG0  : [2:  he ] ]  [  V  :  headed  ]  [  ARG1  :  the  IV  Branch  of  the  Imperial  Chancellery [ONTOTYPE: ORG (0.9247)]  ,  a  charitable  agency  ]  .  ', '  [  ARG0  : [2:  He ] ]  [  V  :  invested  ]  [  ARG1  :  energy  and  resources  ]  [  ARG2  :  in  the  creation  of  hospitals  ,  shelters  ,  and  educational  institutions  ]  .  ']
['  [  ARGM  -  LOC  :  In  1836  -  1840 [ONTOTYPE: DATE (0.8964)]  ]  ,  [  ARG0  : [1:  V.P.   Stasov ] ]  [  V  :  completed  ]  [  ARG1  :  a  number  of  interior  spaces  ,  including  a  large  hall  and  a  house  church  ]  .  ', '  (  not  dry  )  .  ', '  In  the  left  and  right  wings  of [2:  the   school ] were  the  apartments  of [0:  teachers   and   employees ] .  ', '  Among [0:  them ] :  [  ARG0  :  writer  I.  S.  Aksakov [ONTOTYPE: PERSON (0.986)]  ,  poet  A.  N.  Apukhtin [ONTOTYPE: PERSON (0.9919)]  ,  biologist  V.  O.  Kovalevsky [ONTOTYPE: PERSON (0.9897)]  ]  ,  [  V  :  composers  ]  [  ARG1  :  A.  N.  Serov [ONTOTYPE: PERSON (0.9888)]  and  P.  I.  Tchaikovsky [ONTOTYPE: PERSON (0.9887)]  , [1:  art   critic   V.   V.   Stasov [ONTOTYPE: PERSON (0.99)] ] and [1:  his ] brother  ,  famous  lawyer  D.  V.  Stasov [ONTOTYPE: PERSON (0.9937)]  ,  architect  P.  Yu  .  Suzor [ONTOTYPE: PERSON (0.8141)]  ,  scientist  V.  O.  Kovalevsky [ONTOTYPE: PERSON (0.9897)]  ,  chess  player  A.  A.  Alekhin [ONTOTYPE: PERSON (0.9898)]  ]  .  ', '  [  ARGM [ONTOTYPE: ORG (0.8209)]  -  LOC  :  In  1893  -  1895 [ONTOTYPE: DATE (0.8547)]  and  1909  -  1910 [ONTOTYPE: DATE (0.8661)]  ]  ,  [  ARG0  :  Pavel  Yulievich  Suzor  ]  [  V  :  rebuilt  ]  [  ARG1  : [2:  the   building ] ]  ,  [  ARGM [ONTOTYPE: ORG (0.8209)]  -  ADV  :  removing  the  two   middle  columns  of  the  portico  and  making  a  new  main  entrance  ]  .  ', '  [  ARG1  :  The  fronton  ]  was  [  V  :  replaced  ]  [  ARGM  -  MNR  :  with  a  stepped  attic  ]  ,  and  the  shape  of  the  dome  was  changed  .  ', '  In  the  second  half  of  1822 [ONTOTYPE: DATE (0.9192)]  ,  the  Decembrist  G.  S [ONTOTYPE: ORG (0.6)].  [  ARG0  :  Batenkov [ONTOTYPE: PERSON (0.9606)]  (  1793  -  1863 [ONTOTYPE: DATE (0.8889)]  )  ]  [  V  :  lived  ]  [  ARGM  -  LOC  :  in  Z [ONTOTYPE: GPE (0.9057)].  Z [ONTOTYPE: GPE (0.9057)] [ONTOTYPE: WORK_OF_ART (0.7346)].  In  1874  -  1918 [ONTOTYPE: DATE (0.9104)]  ,  V.  G.  Fedorov [ONTOTYPE: PERSON (0.9568)]  (  1874  -  1966 [ONTOTYPE: DATE (0.838)]  )  ,  a  designer  and  gunner  ]  ,  lived  in  Z [ONTOTYPE: GPE (0.9057)].  ']
['  [  ARGM  -  TMP  :  In  the  1920s [ONTOTYPE: DATE (0.7653)]  ]  ,  [ [ ARG1 ] : [1:  the   building ] ]  was  [  V  :  occupied  ]  [  ARG0  :  by [4:  the   Agricultural   Institute [ONTOTYPE: ORG (0.7449)] ] ]  .  ', '  [  ARG0  :  Teachers  ]  [  ARGM  -  DIS  :  also  ]  [  V  :  lived  ]  [  ARGM  -  LOC  :  here  ]  .  ', '  In  square  8  -  1928  -  1948 [ONTOTYPE: DATE (0.7954)]  A.  M.  Innokentyevich [ONTOTYPE: PERSON (0.8596)]  -  founder  of  hematology  in  the  USSR [ONTOTYPE: GPE (0.9975)]  .  ', '  [  ARG1  :  The  main  building  of [0:  the   1960 [ONTOTYPE: DATE (0.9949)]   -   1970 [ONTOTYPE: DATE (0.9065)]   ``   s ] ]  [  V  :  housed  ]  [  ARG2  :  the  NT  and  Lenelectronmash  ]  ,  and  since [0:  the   1960 [ONTOTYPE: DATE (0.9949)]   ``   s ] Lenzhradproekt [ONTOTYPE: PRODUCT (0.6771)]  has  been  located  .  ', '  [  ARGM  -  LOC  :  In  the  left  part  of [1:  the   building ] ]  ,  [  ARG1  :  the  Economic  and  Mathematical  Institute [ONTOTYPE: ORG (0.6906)]  of  the  Russian  Academy  of  Sciences [ONTOTYPE: ORG (0.8781)]  ]  is  [  ARGM  -  TMP  :  now  ]  [  V  :  located  ]  [  ARGM  -  LOC  :  on  the  site  of  former  residential  apartments  ]  .  ', '  To  the  left  of [1:  the   school   building ] ,  on  [ [ ARG1 ] :  the  plot  ]  [  V  :  owned  ]  [  ARG0  :  by  him  ]  ,  was  a  1-storey  building  of  the  school  infirmary  .  ', '  [  ARGM  -  LOC  :  In  1861 [ONTOTYPE: DATE (0.967)]  ]  ,  [  ARG0  : [3:  architect   V.P.   Lvov [ONTOTYPE: PERSON (0.6422)] ] ]  [  V  :  built  ]  [ [ ARG1 ] :  baths  ]  [  ARGM  -  LOC  :  in [3:  his ] place  in  the  1938 [ONTOTYPE: DATE (0.9597)]  ``  s  ]  [  ARGM  -  ADV  :  according  to  the  design  of  Alexander  Ivanovich  Hegello [ONTOTYPE: PERSON (0.8942)]  ,  Chairman  of  the  Union  of  Architects  of  Leningrad [ONTOTYPE: ORG (0.8652)]  ]  .  ', '  1922 [ONTOTYPE: DATE (0.9325)]  :  Petrograd  Cooperative [ONTOTYPE: ORG (0.5727)]  of  [  ARG1  : [4:  the   Agronomic   Institute [ONTOTYPE: ORG (0.5568)] ] ]  ,  [  V  :  registered  ]  [  ARG2  :  with  the  Pepo  Cooperative  Commission [ONTOTYPE: ORG (0.9466)]  ]  ;  Natural  History  and  Agriculture  Museum [ONTOTYPE: ORG (0.8408)]  (  ?  )  ']


groups=[b.strip() for b in re.findall("(?<=[\[\]:])(.*?)(?=[\[\]:])",a) if b.strip()]
for (i,j) in enumerate(groups):
    print("group %d: %s"%(i,j))


group 0: '  2  Architects
group 1: V
group 2: Stasov
group 3: ARG1
group 4: V.  P.  Melnikov
group 5: ARG1
group 6: A.
group 7: ARGM  -  LOC
group 8: 2
group 9: I.   Suzor   P.   Yu
group 10: ONTOTYPE
group 11: PERSON (0.851)
group 12: .  Year  of  construction
group 13: 1
group 14: 1835
group 15: ,  1895  -  1910

