从regex模式返回不匹配的行

3条回答

网友

1楼 · 编辑于 2024-09-29 22:35:51

可以对not使用~：

pat = r'\b.[YF]\w+[LFI]\b'
new_df[~new_df.Sequence.str.contains(pat)]

#   Sequence    Rating
#1  YGEIFEKF    2
#3  YLESFYKF    4
#5  WPDVIHSF    6

网友

2楼 · 编辑于 2024-09-29 22:35:51

只需对现有的布尔级数求反即可：

df[~df.Sequence.str.contains(pat)]

这将为您提供所需的输出：

   Sequence  Rating
1  YGEIFEKF       2
3  YLESFYKF       4
5  WPDVIHSF       6

简要说明：

df.Sequence.str.contains(pat)

将返回布尔级数：

0     True
1    False
2     True
3    False
4     True
5    False
Name: Sequence, dtype: bool

用~取反得到

~df.Sequence.str.contains(pat)

0    False
1     True
2    False
3     True
4    False
5     True
Name: Sequence, dtype: bool

这是另一个可以传递到原始数据帧的布尔序列。你知道吗

网友

3楼 · 编辑于 2024-09-29 22:35:51

Psidom's answer更为优雅，但解决此问题的另一种方法是修改regex模式以使用否定的先行断言，然后使用match()而不是contains()：

pat = r'\b.[YF]\w+[LFI]\b'
not_pat = r'(?!{})'.format(pat)

>>> new_df[new_df.Sequence.str.match(pat)]
   Sequence  Rating
0  HYHIVQKF       1
2  TYGGSWKF       3
4  YYNTAVKL       5

>>> new_df[new_df.Sequence.str.match(not_pat)]
   Sequence  Rating
1  YGEIFEKF       2
3  YLESFYKF       4
5  WPDVIHSF       6

相关问题更多 >

编程相关推荐

热门问题

热门文章

从regex模式返回不匹配的行

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >