如何在使用python的re.findall时包含特殊部分来查找英文姓名?

2024-09-25 02:40:21 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个类似于下面的python代码来搜索所有英文名称:

a = "Bonds met Susann ("Sun") Margreth Branco, the mother of his first two children, in {{city-state|Montreal|Quebec}} in August 1987. They eloped in {{city-state|Las Vegas|Nevada}} Barry Bonds"

re.findall("(?:[A-Z][a-z'.]+\s*){1,4}",a)

我要它回来:

['Bonds', 'Susann ("Sun") Margreth Branco', 'Montreal', 'Quebec', 'August', 'They', 'Las Vegas','Nevada','Barry Bonds']

我的代码不能得到我想要的,如何修改regex来实现我的目标?你知道吗

我想补充一点,我使用了另一个正则表达式(?:(([A-Z][a-z'.]+)|(\(&quot.*"\)))\s*){1,4}。我在regexpal.com上测试它,它在那个测试网站上找到我想要的东西,但是在Python中,它只是不返回我想要的东西,而是返回我Susan("Sun") MargrethBranco,三个分开,但是我想要在我的结果中Susan ("Sun") Margreth Branco


Tags: 代码incitylassunstatetheymontreal
1条回答
网友
1楼 · 发布于 2024-09-25 02:40:21

正如您所提到的,带有“"o”的字符串看起来也是分隔符:

re.findall("[A-Z][a-z]*(?:(?:\\S*&quot\\S*|\\s)+[A-Z][a-z]*){0,3}", "Bonds met Susann ("Sun") Margreth Branco, the mother of his first two children, in {{city-state|Montreal|Quebec}} in August 1987. They eloped in {{city-state|Las Vegas|Nevada}} Barry Bonds")

输出:

['Bonds', 'Susann ("Sun") Margreth Branco', 'Montreal', 'Quebec', 'August', 'They', 'Las Vegas', 'Nevada', 'Barry Bonds']

相关问题 更多 >