将Pandas中的正则表达式值转换为0或1

df['Name']= df['Name'].replace(r"^(.(?=anaphylaxis))*?$", 1,regex=True) df['Name']= df['Name'].replace(r"^(.(?<!anaphylaxis))*?$", 0, regex=True) ID Name 84 Drug-induced anaphylaxis 1041 Acute anaphylaxis 1194 Anaphylactic reaction 1483 Anaphylactic reaction, due to adverse effect o... 2226 Anaphylaxis, initial encounter 2428 Anaphylaxis 2831 Anaphylactic shock 4900 Other anaphylactic reaction

2条回答

网友

1楼 · 编辑于 2024-07-02 11:37:36

使用str.contains进行不区分大小写的匹配。你知道吗

import re
df['Name'] = df['Name'].str.contains(r'anaphylaxis', flags=re.IGNORECASE).astype(int)

或者，更简洁地说

df['Name'] = df['Name'].str.contains(r'(?i)anaphylaxis').astype(int)

df
     ID  Name
0    84     1
1  1041     1
2  1194     0
3  1483     0
4  2226     1
5  2428     1
6  2831     0
7  4900     0

contains在您还希望执行基于regex的匹配时非常有用。尽管在这种情况下，您可能可以通过添加regex=False来完全摆脱regex，以获得更高的性能。你知道吗

但是，要获得更多的绩效，请使用列表理解。你知道吗

df['Name'] = np.array(['anaphylaxis' in x.lower() for x in df['Name']], dtype=int)

或者更好

df['Name'] = [1 if 'anaphylaxis' in x.lower() else 0 for x in df['Name'].tolist()]

df

     ID  Name
0    84     1
1  1041     1
2  1194     0
3  1483     0
4  2226     1
5  2428     1
6  2831     0
7  4900     0

网友

2楼 · 编辑于 2024-07-02 11:37:36

可以使用^{}而不是regex。此方法返回一个布尔序列，然后将其转换为int。你知道吗

df['Name']= df['Name'].str.contains('anaphylaxis', case=False, regex=False)\
                      .astype(int)

结果：

     ID  Name
0    84     1
1  1041     1
2  1194     0
3  1483     0
4  2226     1
5  2428     1
6  2831     0
7  4900     0

相关问题更多 >

编程相关推荐

热门问题

热门文章