在列中查找字符串并对行进行分类

2024-06-25 22:46:02 发布

您现在位置:Python中文网/ 问答频道 /正文

需要帮助解决一个问题。请让我知道什么是最好的解决办法

我有一个主数据框,它在下面

df_master = pd.DataFrame({'sentence' : ['John is a boy','amie is a girl','helen is a girl','ram is a boy','sita is a girl', 'John is a boy', 'amie is a girl']})

从上面的DF中,我创建了另一个具有唯一行的DF,并将其写入excel,并向其中添加两个新列"Find""Category")。下面是DF最终的外观

df_unique = pd.DataFrame({'sentence' : ['John is a boy','amie is a girl','helen is a girl','ram is a boy','sita is a girl'],
                         'find':['boy','girl',np.nan,np.nan,np.nan],
                          'category': ['male','female',np.nan,np.nan,np.nan]})

现在我需要修改df_master,最后应该如下所示。 为了实现这一点,我必须首先逐个读取df_uniqueDF中的行,并在df_master'sentence'列中搜索"find"列中的单词,然后将df_uniqueDF中的'category'列填充到df_master_finalDF中的'category'列中

df_master_final = pd.DataFrame({'sentence' : ['John is a boy','amie is a girl','helen is a girl','ram is a boy','sita is a girl', 'John is a boy', 'amie is a girl'],
                                'category': ['male','female','female','male','female','male','female']})

请注意,上面是一个示例,我所说的df_master中约有5000行,df中约有2000行

如何和帽子将是最好的方法来实现这一点,因为我将不得不通过机器人的DF和ItErrors是非常缓慢的


Tags: masterdfisnpnanjohnsentencemale
1条回答
网友
1楼 · 发布于 2024-06-25 22:46:02

这是一个完全矢量化的解决方案。它假定只有一个搜索键可以/应该匹配

# Preparing the result dataframe.
df_master_2 = df_master

# Preparing the lookup dataframe to contain non-empty mapping.
mapped = df_unique[np.logical_not(df_unique.find.isna())][['find', 'category']]

# Extracting the lookup values from the sentence column.
keys_re = '.*({}).*'.format('|'.join(mapped.find.values))
df_master_2['find'] = df_master_2.sentence.str.extract(keys_re)

# Joining the category.
df_master_2 = pd.merge(df_master_2, mapped, on=['find'])

# Selecting only the fields we want.
df_master_2 = df_master_2[['sentence', 'category']]

相关问题 更多 >