我正在对书目数据进行一些简单的文本提取,并有如下字符串:
texts = '36 L. Ronse De Craene / Flora 221 (2016) 22–37Chen, L., Ren, Y., Endress, P.K., Tian, X.H., Zhang, X.H., 2007. Floral organogenesis inTetracentron sinense (Trochodendraceae) and its systematic significance. PlantSyst. Evol. 264, 183–193.Choob, V.V., Yurtseva, O.V., 2007. Mathematical model of flower formation in thePolygonaceae members. Bot. Zh. 92, 114–134.Clark, S.E., Running, M.P., Meyerowitz, E.M., 1993. CLAVATA1, a regulator ofmeristem and flower development in Arabidopsis. Development 119, 397–418.Clark, S.E., Running, M.P., Meyerowitz, E.M., 1995. CLAVATA3 is a specific regulatorof shoot and floral meristem development affecting the same processes asCLAVATA1. Development 121, 2057–2067.Costello, A., Motley, T.J., 2004. The development of the superior ovary inTetraplasandra (Araliaceae). Am. J. Bot. 91, 644–655.Davidson, C., 1973. An anatomical and morphological study of Datiscaceae. Aliso 8,49–110.Dickison, W.C., 1990a. A study of the floral morphology and anatomy of theCaryocaraceae. Bull. Torrey Bot. Club 117, 123–137'
我想在页码处对该字符串进行子集,每个条目末尾的页码以xxx-xxx的形式出现,其中x是一个数字,因此我认为类似的内容应该可以工作:
re.split(r'\d+\-\d+', texts)
我尝试过几种不同的方法,但没有成功。我不经常使用正则表达式,我想我遗漏了一些小东西
我的目标是:
['36 L. Ronse De Craene / Flora 221 (2016)',
'Chen, L., Ren, Y., Endress, P.K., Tian, X.H., Zhang, X.H., 2007. Floral organogenesis inTetracentron sinense (Trochodendraceae) and its systematic significance. PlantSyst. Evol. 264,',
'.Choob, V.V., Yurtseva, O.V., 2007. Mathematical model of flower formation in thePolygonaceae members. Bot. Zh. 92,',
...]
文本字符串中的–与正则表达式中的–不同:
当你把一个写在另一个上面时,你可以看到:
-
–
相关问题 更多 >
编程相关推荐