从正则表达式中的字符串中精确提取四个整数

list1 = ['Contact: Hamdan Z Hamdan, MBBS, Msc', '\r\n ', '+249912468264', '\r\n ', 'hamdanology@hotmail.com', '\r\n ', 'Contact: Maha I Mohammed, MBBS, PhD', '\r\n ', '+249912230895', '\r\n ', '\r\n ', 'Sudan', 'Jaber abo aliz', '\r\n ', 'Recruiting', '\r\n ', 'Khartoum, Sudan, 1111 ', u'Contact: Khaled H Bakheet, MD,PhD \xa0 \xa0 +249912957764 \xa0 \xa0 ', 'khalid2_3456@yahoo.com', u' \xa0 \xa0 ', u'Principal Investigator: Hamdan Z Hamdan, MBBS,MSc \xa0 \xa0 \xa0 \xa0 \xa0 \xa0 ', 'Principal Investigator:', '\r\n ', 'Hamdan Z Hamdan, MBBS, MSc', '\r\n ', 'Al-Neelain University', '\r\n ' ]

2条回答

网友

1楼 · 编辑于 2024-05-17 07:35:02

你可以试试下面的方法

>>> [l for l in (re.findall(r"[^\d](\d{4})[^\d]",s) for s in list1) if l]
[['1111'], ['3456']]

如果你只对单词边界上的四位数感兴趣，请使用

^{pr2}$

网友

2楼 · 编辑于 2024-05-17 07:35:02

您可以在正则表达式中使用\b来表示单词边界，因此以下操作适用：

import re

for s in list1:
    m = re.search(r'\b\d{4}\b', s)
    if m:
        print m.group(0)

。。。它只输出1111。documentation for ^{}进一步解释了：

\b
Matches the empty string, but only at the beginning or end of a word. A word is defined as a sequence of alphanumeric or underscore characters, so the end of a word is indicated by whitespace or a non-alphanumeric, non-underscore character. [...]

相关问题更多 >

编程相关推荐

热门问题

热门文章