基于Stanford-NLP的python信息抽取与关系抽取

2024-05-17 08:21:26 发布

您现在位置:Python中文网/ 问答频道 /正文

如何使用standfordcore NLP for Python从一堆文档中提取一些公司的名称?在

以下是我的数据示例:

‘3Trucks Inc (‘3Trucks’ or the Company) is a tech-enabled long-haul B2B digital platform matching cargo owners with long-haul freight needs and truck owners who can service them, through its internally-developed digital platform.founded in 2016, 3Trucks is headquartered in California and has leased offices in Boston and Florida. Some of their top clients are, Google,IBM and Nokia

3Trucks was founded in 2010, with Mr. Mark Robert as its CEO and John Mclean as a Partner and CTO.'

我想输出信息提取:

3Truck

我想为关系提取输出:

^{pr2}$

Tags: andinfornlpisaswithlong
2条回答

很简单, 您可以使用Spacy NER(自然语言实体识别)来完成任务。它有一组预先训练的模型来识别不同的实体。在

通常命名的实体识别将用于此类应用程序,但NER只能将其分为一些类别。在

from nltk import word_tokenize, pos_tag, ne_chunk
from nltk.chunk import tree2conlltags

sentence = "Mark and John are working at Google."
print(tree2conlltags(ne_chunk(pos_tag(word_tokenize(sentence))
"""[('Mark', 'NNP', 'B-PERSON'), 
    ('and', 'CC', 'O'), ('John', 'NNP', 'B-PERSON'), 
    ('are', 'VBP', 'O'), ('working', 'VBG', 'O'), 
    ('at', 'IN', 'O'), ('Google', 'NNP', 'B-ORGANIZATION'), 
    ('.', '.', 'O')] """

对于您的应用程序,您必须训练与数据相关的命名实体识别,您将询问Training NER

相关问题 更多 >