如何提取名词和形容词对,包括连词

2024-05-20 00:55:14 发布

您现在位置:Python中文网/ 问答频道 /正文

背景

我想使用NLP库(如spaCy)提取名词和形容词对

预期的输入和输出如下

The pink, beautiful, and small flowers are blown away.
{'flowers':['pink', 'beautiful', 'small']}

I got a red candy and an interesting book.
{'candy':['red'], 'book':['interesting']}

问题

在回答了一个类似的问题How to extract noun adjective pairs from a sentence之后,我用输入执行了这个程序

但是,它没有返回任何输出

[]

代码

import spacy
nlp = spacy.load('en')
doc = nlp('The beautiful and small flowers are blown away.')
noun_adj_pairs = []
for i,token in enumerate(doc):
    if token.pos_ not in ('NOUN','PROPN'):
        continue
    for j in range(i+1,len(doc)):
        if doc[j].pos_ == 'ADJ':
            noun_adj_pairs.append((token,doc[j]))
            break
print(noun_adj_pairs)

审判

我试图写一个新的代码,但我仍然不知道如何处理带有连词的形容词

input
I got a red candy and an interesting book.
output
{'candy': 'red', 'book': 'interesting'}

input
The pink, beautiful, and small flowers are blown away.
output
{'flowers': 'small'}

试用码

import spacy
nlp = spacy.load('en')
doc = nlp('I got a red candy and an interesting book.')
noun_adj_pairs = {}
for word in doc:
    if word.pos_ == 'ADJ' and word.dep_ != "cc":
        if word.head.pos_ =="NOUN":
            noun_adj_pairs[str(word.head.text)]=str(word.text)

print(noun_adj_pairs)

环境

Python 3.6


Tags: anddocnlpspacyredwordsmallnoun
1条回答
网友
1楼 · 发布于 2024-05-20 00:55:14

您可能希望尝试^{}

import spacy
nlp = spacy.load('en_core_web_sm')
doc = nlp('I got a red candy and an interesting and big book.')

noun_adj_pairs = {}
for chunk in doc.noun_chunks:
    adj = []
    noun = ""
    for tok in chunk:
        if tok.pos_ == "NOUN":
            noun = tok.text
        if tok.pos_ == "ADJ":
            adj.append(tok.text)
    if noun:
        noun_adj_pairs.update({noun:adj})

# expected output
noun_adj_pairs
{'candy': ['red'], 'book': ['interesting', 'big']}

如果您希望包括连词:

noun_adj_pairs = {}
for chunk in doc.noun_chunks:
    adj = []
    noun = ""
    for tok in chunk:
        if tok.pos_ == "NOUN":
            noun = tok.text
        if tok.pos_ == "ADJ" or tok.pos_ == "CCONJ":
            adj.append(tok.text)
    if noun:
        noun_adj_pairs.update({noun:" ".join(adj)})

noun_adj_pairs
{'candy': 'red', 'book': 'interesting and big'}

相关问题 更多 >