如何使用SPACYNLP查找专有名词

import spacy nlp = spacy.load("en_core_web_sm") doc = nlp("The company was also one of OpTic Gaming's main sponsors during the legendary organization's run to their first Call of Duty Championship back in 2017") for chunk in doc.noun_chunks: print(chunk.text, chunk.root.text, chunk.root.dep_, chunk.root.head.text)

1条回答

网友

1楼 · 发布于 2024-05-20 07:16:14

Spacy为您提取词性（专有名词、行列式、动词等）。您可以使用token.pos_在令牌级别访问它们

就你而言：

import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("The company was also one of OpTic Gaming's main sponsors during the legendary organization's run to their first Call of Duty Championship back in 2017")

for tok in doc:
    print(tok, tok.pos_)

...
one NUM
of ADP
OpTic PROPN
Gaming PROPN
...

然后，您可以过滤专有名词，对连续的专有名词进行分组，并对文档进行切片以获得标称组：

def extract_proper_nouns(doc):
    pos = [tok.i for tok in doc if tok.pos_ == "PROPN"]
    consecutives = []
    current = []
    for elt in pos:
        if len(current) == 0:
            current.append(elt)
        else:
            if current[-1] == elt - 1:
                current.append(elt)
            else:
                consecutives.append(current)
                current = [elt]
    if len(current) != 0:
        consecutives.append(current)
    return [doc[consecutive[0]:consecutive[-1]+1] for consecutive in consecutives]

extract_proper_nouns(doc)
[OpTic Gaming, Duty Championship]

更多详细信息请参见：https://spacy.io/usage/linguistic-features

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何使用SPACYNLP查找专有名词

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >