拆分两个名字可能有共同姓氏的名字列表

import re p = re.compile(ur'([A-Z]\w+\s+[A-Z]\w+)|([A-Z]\w+)(?=\s+and\s+[A-Z]\w+\s+([A-Z]\w+))', re.MULTILINE) test_str = u"Russ Middleton and Lisa Murro\nRon Iervolino, Trish and Russ Middleton, and Lisa Middleton \nRon Iervolino, Kelly and Tom Murro\nRon Iervolino, Trish and Russ Middleton and Lisa Middleton " subst = u"$1$2 $3" result = re.sub(p, subst, test_str)

2条回答

网友
1楼 · 编辑于 2024-10-02 08:22:27

这应该给你一个想法，先用这个模式
([A-Z]\w+\s+[A-Z]\w+)|([A-Z]\w+)(?=\s+and\s+[A-Z]\w+\s+([A-Z]\w+))
替换为w/$1$2 $3
Demo

网友
2楼 · 编辑于 2024-10-02 08:22:27

作为第一次匹配的更有效方法，您可以使用str.split()（如果您的字符串已被,分割）：
>>> s=u' Ron Iervolino, Trish Iervolino, Russ Middleton, and Lisa Middleton ' >>> [i.split('and')[1] if i.strip().startswith('and') else i for i in s.split(',')] [u' Ron Iervolino', u' Trish Iervolino', u' Russ Middleton', u' Lisa Middleton ']
对于在u' Kelly and Tom Murro '中查找名称，可以使用以下命令：
l=[] s=u' Ron Iervolino, Trish Iervolino, Russ Middleton, and Lisa Middleton ,Kelly and Tom Murro' import re for i in s.split(','): i=i.strip() if i.startswith('and') : l.append(i.split('and')[1]) elif not i.endswith('and') and 'and' in i : names=[i for i in re.split(r'and| ',i) if i] for t in zip(names[:-1],[names[-1] for i in range(len(names)-1)]): l.append(' '.join(t)) else: l.append(i) print l [u'Ron Iervolino', u'Trish Iervolino', u'Russ Middleton', u' Lisa Middleton', u'Kelly Murro', u'Tom Murro']
当您遇到像u' Kelly and Tom Murro '这样的字符串时，首先将它拆分为一个名称列表，其中[i for i in re.split(r'and| ',i) if i]基于'and'、space拆分字符串，这样您就有了[u'Kelly', u'Tom', u'Murro']。然后，您需要以下名称：
u'Kelly Murro' u'Tom Murro'
您可以创建一个zip文件，其中包含repeat the last element和名为from begin of the list to lastnames[:-1]的元素，因此您将拥有以下内容。请注意，此方法适用于最长的名称，如（Kelly and Tom and rose and sarah Murro）：
[(u'Kelly', u'Murro'), (u'Tom', u'Murro')]

相关问题更多 >

编程相关推荐

热门问题

热门文章