从正则表达式中的字符串中精确提取四个整数

2024-05-17 07:35:02 发布

您现在位置:Python中文网/ 问答频道 /正文

list1 = ['Contact: Hamdan Z Hamdan, MBBS, Msc',
        '\r\n            ',
        '+249912468264',
        '\r\n                  ',
        'hamdanology@hotmail.com',
        '\r\n                ',
        'Contact: Maha I Mohammed, MBBS, PhD',
        '\r\n            ',
        '+249912230895',
        '\r\n                  ',
        '\r\n                ',
        'Sudan',
        'Jaber abo aliz',
        '\r\n                  ',
        'Recruiting',
        '\r\n          ',
        'Khartoum, Sudan, 1111  ',
        u'Contact: Khaled H Bakheet, MD,PhD \xa0 \xa0 +249912957764 \xa0 \xa0 ',
        'khalid2_3456@yahoo.com',
        u' \xa0 \xa0 ',
        u'Principal Investigator: Hamdan Z Hamdan, MBBS,MSc \xa0 \xa0  \xa0 \xa0  \xa0 \xa0 ',
       'Principal Investigator:',
       '\r\n      ',
       'Hamdan Z Hamdan, MBBS, MSc',
       '\r\n            ',
        'Al-Neelain University',
        '\r\n                '
    ]

从这个字符串列表中,我只需要提取不应该与其他字符关联的4位整数?在

示例:“1111”是所需的输出。在

我们应该如何用python编写regex?显然,这行不通:*([\d]{4})*。在


Tags: comprincipalcontactxa0phdmsclist1hotmail
2条回答

你可以试试下面的方法

>>> [l for l in (re.findall(r"[^\d](\d{4})[^\d]",s) for s in list1) if l]
[['1111'], ['3456']]

如果你只对单词边界上的四位数感兴趣,请使用

^{pr2}$

您可以在正则表达式中使用\b来表示单词边界,因此以下操作适用:

import re

for s in list1:
    m = re.search(r'\b\d{4}\b', s)
    if m:
        print m.group(0)

。。。它只输出1111documentation for ^{}进一步解释了:

\b

Matches the empty string, but only at the beginning or end of a word. A word is defined as a sequence of alphanumeric or underscore characters, so the end of a word is indicated by whitespace or a non-alphanumeric, non-underscore character. [...]

相关问题 更多 >