在列中查找字符串并对行进行分类

df_unique = pd.DataFrame({'sentence' : ['John is a boy','amie is a girl','helen is a girl','ram is a boy','sita is a girl'], 'find':['boy','girl',np.nan,np.nan,np.nan], 'category': ['male','female',np.nan,np.nan,np.nan]})

df_master_final = pd.DataFrame({'sentence' : ['John is a boy','amie is a girl','helen is a girl','ram is a boy','sita is a girl', 'John is a boy', 'amie is a girl'], 'category': ['male','female','female','male','female','male','female']})

1条回答

网友

1楼 · 发布于 2024-06-25 22:46:02

这是一个完全矢量化的解决方案。它假定只有一个搜索键可以/应该匹配

# Preparing the result dataframe.
df_master_2 = df_master

# Preparing the lookup dataframe to contain non-empty mapping.
mapped = df_unique[np.logical_not(df_unique.find.isna())][['find', 'category']]

# Extracting the lookup values from the sentence column.
keys_re = '.*({}).*'.format('|'.join(mapped.find.values))
df_master_2['find'] = df_master_2.sentence.str.extract(keys_re)

# Joining the category.
df_master_2 = pd.merge(df_master_2, mapped, on=['find'])

# Selecting only the fields we want.
df_master_2 = df_master_2[['sentence', 'category']]

相关问题更多 >

编程相关推荐

热门问题

热门文章