从列中提取子字符串

data = {'Name':['inf.negem.netmgmt', 'infbe_cdb', 'inf_igh', 'INF_EONLOG','inf.dkprime.netmgmt','infaus_mgo','infau_abr']} df = pd.DataFrame(data) print(df) Name 0 inf.negem.netmgmt 1 infbe_cdb 2 inf_igh 3 INF_EONLOG 4 inf.dkprime.netmgmt 5 infaus_mgo 6 infau_abr I tried following code.but i am not df['Country'] = df['Name'].str.slice(3,6) I would like to see output like below output = {'Country':['No_Country', 'be', 'No_Country', 'No_Country','No_Country','aus','au']} df = pd.DataFrame(output) print(df) Country 0 No_Country 1 be 2 No_Country 3 No_Country 4 No_Country 5 aus 6 au Note: I would like to extract words between 'inf' and '_' as country and would like to create new column as Country. if nothing is there after inf then it's value is 'No_Country'

2条回答

网友

1楼 · 编辑于 2024-05-07 14:12:37

使用列表理解和re.findall：

import re
df['Country'] = ["".join(re.findall(r'inf(.*?)_', i)) for i in df['Name']]


print(df)
                  Name    Country
0    inf.negem.netmgmt        
1            infbe_cdb       be
2              inf_igh        
3           INF_EONLOG        
4  inf.dkprime.netmgmt        
5           infaus_mgo       aus
6            infau_abr       au

网友

2楼 · 编辑于 2024-05-07 14:12:37

这里有一种使用^{}的方法：

df['Country'] = (df.Name.str.lower()
                        .str.extract(r'inf(.*?)_')
                        .replace('', float('nan'))
                        .fillna('No_Country'))

print(df)

               Name     Country
0    inf.negem.netmgmt  No_Country
1            infbe_cdb          be
2              inf_igh  No_Country
3           INF_EONLOG  No_Country
4  inf.dkprime.netmgmt  No_Country
5           infaus_mgo         aus
6            infau_abr          au

相关问题更多 >

编程相关推荐

热门问题

热门文章