Iasked a question before关于使用正则表达式将长电子邮件正文文本中匹配的姓名和电子邮件组提取为元组。解决方案非常有效,例如从以下文本中提取姓名和电子邮件:
> Begin forwarded message:
> Date: December 20, 2013 at 11:32:39 AM GMT-3
> Subject: My dummy subject
> From: Charlie Brown <aaa@aa-aaa.com>
> To: maria.brown@aaa.com, George Washington <george@washington.com>, =
thomas.jefferson@aaa.com, thomas.alva.edison@aaa.com, Juan =
<juan@aaa.com>, Alan <alan@aaa.com>, Alec <alec@aaa.com>, =
Alejandro <aaa@aaa.com>, Alex <aaa@planeas.com>, Andrea =
<andrea.mery@thomsen.cl>, Andrea <andrea.22@aaa.com>, Andres =
<andres@aaa.com>, Andres <avaldivieso@aaa.com>
> Hi,
> Please reply ASAP with your RSVP
> Bye
使用这个正则表达式:
[:,]\s*=?\s*(?:([A-Z][a-z]+(?:\s[A-Z][a-z]+)?))?\s*=?\s*.*?([\w.]+@[\w.-]+)
生成此输出:
[(Charlie Brown', 'aaa@aaa.com'),('','maria.brown@aaa.com'),('George Washington', 'george@washington.com'),('','thomas.jefferson@aaa.com'),('','thomas.alva.edison@aaa.com'),('Juan','juan@aaa.com',('Alan', 'alan@aaa.com'), ('Alec', 'alec@aaa.com'),('Alejandro','aaa@aaa.com'),('Alex', 'aaa@aaa.com'),('Andrea','andrea.mery@thomsen.cl'),('Andrea','andrea.22@aaa.com',('Andres','andres@aaa.com'),('Andres','avaldivieso@aaa.com')]
但是,我偶然发现了这样一个例子:我传递给regex的文本中的名字有特殊的重音字符。如何更新上面的正则表达式以不中断并捕获包含重音字符的名称,如: 你知道吗
"á", "é", "í", "ó", "ú", "ç", "ö", "ü", "ñ", "à", "è", "ì", "ò", "ù"
(及其上级)
谢谢!你知道吗
Use ^{} module instead of ^{} to support unicode regex 。我刚刚把
[a-z]+
改成了\p{L}+
(,它匹配任何语言的任何一种字母)。你知道吗相关问题 更多 >
编程相关推荐