str.split在页码上

2024-09-28 22:31:34 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在对书目数据进行一些简单的文本提取,并有如下字符串:

texts = '36 L. Ronse De Craene / Flora 221 (2016) 22–37Chen, L., Ren, Y., Endress, P.K., Tian, X.H., Zhang, X.H., 2007. Floral organogenesis inTetracentron sinense (Trochodendraceae) and its systematic significance. PlantSyst. Evol. 264, 183–193.Choob, V.V., Yurtseva, O.V., 2007. Mathematical model of flower formation in thePolygonaceae members. Bot. Zh. 92, 114–134.Clark, S.E., Running, M.P., Meyerowitz, E.M., 1993. CLAVATA1, a regulator ofmeristem and flower development in Arabidopsis. Development 119, 397–418.Clark, S.E., Running, M.P., Meyerowitz, E.M., 1995. CLAVATA3 is a specific regulatorof shoot and floral meristem development affecting the same processes asCLAVATA1. Development 121, 2057–2067.Costello, A., Motley, T.J., 2004. The development of the superior ovary inTetraplasandra (Araliaceae). Am. J. Bot. 91, 644–655.Davidson, C., 1973. An anatomical and morphological study of Datiscaceae. Aliso 8,49–110.Dickison, W.C., 1990a. A study of the floral morphology and anatomy of theCaryocaraceae. Bull. Torrey Bot. Club 117, 123–137'

我想在页码处对该字符串进行子集,每个条目末尾的页码以xxx-xxx的形式出现,其中x是一个数字,因此我认为类似的内容应该可以工作:

re.split(r'\d+\-\d+', texts)

我尝试过几种不同的方法,但没有成功。我不经常使用正则表达式,我想我遗漏了一些小东西

我的目标是:

['36 L. Ronse De Craene / Flora 221 (2016)',

'Chen, L., Ren, Y., Endress, P.K., Tian, X.H., Zhang, X.H., 2007. Floral organogenesis inTetracentron sinense (Trochodendraceae) and its systematic significance. PlantSyst. Evol. 264,',

'.Choob, V.V., Yurtseva, O.V., 2007. Mathematical model of flower formation in thePolygonaceae members. Bot. Zh. 92,',  

...] 

Tags: andofthe字符串inbotdedevelopment