清理名称的正则表达式

2条回答

网友

1楼 · 编辑于 2024-09-30 01:24:29

似乎您要在这里查找的是在之后的任何位置都没有逗号的第一个单词字符序列，而不是在之前有逗号的字符序列。因此，似乎您需要的不是积极的前瞻性断言，而是消极的前瞻性断言

尝试将以下内容用作正则表达式：

r'\w+(?!.*,)'

使用以下方法应用此功能：

df['name'].apply(lambda name:re.search(r'\w+(?!.*,)',name).group())

将上述内容应用于此示例数据帧：

                name   foo
0     JOSEPH W. JOHN     1
1     MIMI N. ALFORD     3
2         WANG E. Li     3
3    AAMIR, DENNIS M     3
4  MAHAMMED, LINDA X     3
5     ABAD, FARLEY J     3

给出：

0    JOSEPH
1      MIMI
2      WANG
3    DENNIS
4     LINDA
5    FARLEY

网友
2楼 · 编辑于 2024-09-30 01:24:29

使用
df['First Name'] = df['name'].str.extract(r'(?:(?<=^(?!.*,))|(?<=, ))([A-Z]+)', expand=False)
见proof
解释
(?: group, but do not capture: (?<= look behind to see if there is: ^ the beginning of the string (?! look ahead to see if there is not: .* any character except \n (0 or more times (matching the most amount possible)) , ',' ) end of look-ahead ) end of look-behind | OR (?<= look behind to see if there is: , ', ' ) end of look-behind ) end of grouping ( group and capture to \1: [A-Z]+ any character of: 'A' to 'Z' (1 or more times (matching the most amount possible)) ) end of \1

相关问题更多 >

编程相关推荐

热门问题

热门文章