为什么WDT单词被标记为一个句子主语依赖分析?

2024-06-26 14:05:33 发布

您现在位置:Python中文网/ 问答频道 /正文

我要报告每句话的主题;并提取所有的修饰语(例如,“唐纳德·特朗普”不仅仅是“特朗普”((a)平均剩余租赁期限“不只是”期限“。)

这是我的测试代码:

import spacy

nlp = spacy.load('en_core_web_sm')

def handle(doc):
    for sent in doc.sents:
        shownSentence = False
        for token in sent:
            if(token.dep_=="nsubj"):
                if(not shownSentence):
                    print("----------")
                    print(sent)
                    shownSentence = True
                print("{0}/{1}".format(token.text, token.tag_))
                print([ [t,t.tag_] for t in token.children])

handle(nlp('Donald Trump, legend in his own lifetime, said: "This transaction is a continuation of our main strategy to invest in assets which offer growth potential and that are coloured pink." The average remaining lease term is six years, and Laura Palmer was killed by Bob. Trump added he will sell up soon.'))

输出如下。我想知道为什么我会把“which/WDT”作为一个主题?它只是模型噪音,还是被认为是正确的行为(顺便说一句,在我的真实句子中,结构相同,我还得到了“that/WDT”被标记为主语。)(更新:如果我切换到“en\u core\u web\u md”,那么我就得到了“that/WDT”作为我的特朗普例子;这是从小型模式向中型模式转变的唯一区别。)

我可以通过查看tag_轻松地过滤掉它们;我对潜在的原因更感兴趣

更新:顺便说一句,“Laura Palmer”不会作为主题被此代码删除,因为dep_值是“nsubjpass”,而不是“nsubj”。)

----------
Donald Trump, legend in his own lifetime, said: "This transaction is a continuation of our main strategy to invest in assets which offer growth potential and that are coloured pink."
Trump/NNP
[[Donald, 'NNP'], [,, ','], [legend, 'NN'], [,, ',']]
transaction/NN
[[This, 'DT']]
which/WDT
[]
----------
The average remaining lease term is six years, and Laura Palmer was killed by Bob.
term/NN
[[The, 'DT'], [average, 'JJ'], [remaining, 'JJ'], [lease, 'NN']]
----------
Trump added he will sell up soon.
Trump/NNP
[]
he/PRP
[]

(顺便说一句,大局是:代词解析。我想把PRPs转换成它们所指的文本。)


Tags: andintokenwhich主题forthatis