如何将命名实体识别功能应用于所有列并返回符合条件的列名

import spacy import pandas as pd import en_core_web_sm nlp = en_core_web_sm.load() text = [["Canada", 'University of California has great research', "non-location"],["China", 'MIT is at Boston', "non-location"]] df = pd.DataFrame(text, columns = ['text', 'text2', 'text3']) df['new_col'] = df['text2'].apply(lambda x: [[w.label_] for w in list(nlp(x).ents)]) df

1条回答

网友

1楼 · 发布于 2024-10-05 13:19:40

您可以使用数据集列，对它们进行迭代，并使用相同的逻辑将新列追加到现有数据集，如下所示：

import spacy
import pandas as pd
import en_core_web_sm
nlp = en_core_web_sm.load()
text = [["Canada", 'University of California has great research', "non-location"],["China", 'MIT is at Boston', "non-location"]]
df = pd.DataFrame(text, columns = ['text', 'text2', 'text3'])

col_list = df.columns # obtains the columns of the dataframe

for col in col_list:
    df["".join("ent_" + col)] = df[col].apply(lambda x: [[w.label_] for w in list(nlp(x).ents)]) # combine the ent_<<col_name>> as the new columns which contain the named entities.

从这个获得的更新数据帧中，可以应用过滤器删除不包含GPE值的列

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何将命名实体识别功能应用于所有列并返回符合条件的列名

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >