基于python的stanfordnlp中回指消解

3条回答

网友

1楼 · 编辑于 2024-10-06 06:57:21

这里有一个可能的解决方案，它使用CoreNLP输出的数据结构。提供所有信息。这不是一个完整的解决方案，可能需要扩展来处理所有情况，但这是一个很好的起点。在

from pycorenlp import StanfordCoreNLP

nlp = StanfordCoreNLP('http://localhost:9000')


def resolve(corenlp_output):
    """ Transfer the word form of the antecedent to its associated pronominal anaphor(s) """
    for coref in corenlp_output['corefs']:
        mentions = corenlp_output['corefs'][coref]
        antecedent = mentions[0]  # the antecedent is the first mention in the coreference chain
        for j in range(1, len(mentions)):
            mention = mentions[j]
            if mention['type'] == 'PRONOMINAL':
                # get the attributes of the target mention in the corresponding sentence
                target_sentence = mention['sentNum']
                target_token = mention['startIndex'] - 1
                # transfer the antecedent's word form to the appropriate token in the sentence
                corenlp_output['sentences'][target_sentence - 1]['tokens'][target_token]['word'] = antecedent['text']


def print_resolved(corenlp_output):
    """ Print the "resolved" output """
    possessives = ['hers', 'his', 'their', 'theirs']
    for sentence in corenlp_output['sentences']:
        for token in sentence['tokens']:
            output_word = token['word']
            # check lemmas as well as tags for possessive pronouns in case of tagging errors
            if token['lemma'] in possessives or token['pos'] == 'PRP$':
                output_word += "'s"  # add the possessive morpheme
            output_word += token['after']
            print(output_word, end='')


text = "Tom and Jane are good friends. They are cool. He knows a lot of things and so does she. His car is red, but " \
       "hers is blue. It is older than hers. The big cat ate its dinner."

output = nlp.annotate(text, properties= {'annotators':'dcoref','outputFormat':'json','ner.useSUTime':'false'})

resolve(output)

print('Original:', text)
print('Resolved: ', end='')
print_resolved(output)

这将产生以下输出：

^{pr2}$

正如你所看到的，当代词有一个句子首字母（标题大小写）先行词时，这个解决方案并不处理大小写的更正（“大猫咪”而不是最后一个句子中的“大猫”）。这取决于先行词的类别-普通名词先行词需要小写，而专有名词先行词则不需要。其他一些特殊的处理可能是必要的（比如我测试句子中的所有格）。它还假定您不希望重用原始输出标记，因为它们是由此代码修改的。一种解决方法是复制原始数据结构或创建一个新属性，并相应地更改print_resolved函数。纠正任何分辨率错误也是另一个挑战！在

网友

2楼 · 编辑于 2024-10-06 06:57:21

我也有类似的问题。在尝试了核心nlp之后，我用neural coref解决了这个问题。通过使用以下代码，您可以轻松地通过neural coref完成这项工作：

导入空间

nlp=空间负荷（'en'u coref'u md'）

doc=nlp（只有满足以下所有条件时，u’电话区号才有效。不能为空。它应该是数字。不能小于200。最小位数应为3'）在

打印（doc.\uE.coref_群集）

打印（已解决文档核心问题）

上述代码的输出为： [电话区号：[电话区号，It，It，It]]

电话区号只有在满足以下所有条件时才有效。电话区号不能为空。电话区号应为数字。电话区号不能小于200。最小位数应为3。在

为此，您将需要有空间，以及英国模型，可以是en_coref_md或en_coref_lg或en_coref_sm。您可以参考以下链接以获得更好的解释：

https://github.com/huggingface/neuralcoref

网友

3楼 · 编辑于 2024-10-06 06:57:21

from stanfordnlp.server import CoreNLPClient
from nltk import tokenize

client = CoreNLPClient(annotators=['tokenize','ssplit', 'pos', 'lemma', 'ner', 'parse', 'coref'], memory='4G', endpoint='http://localhost:9001')

def pronoun_resolution(text):

    ann = client.annotate(text)
    modified_text = tokenize.sent_tokenize(text)

    for coref in ann.corefChain:

        antecedent = []
        for mention in coref.mention:
            phrase = []
            for i in range(mention.beginIndex, mention.endIndex):
                phrase.append(ann.sentence[mention.sentenceIndex].token[i].word)
            if antecedent == []:
                antecedent = ' '.join(word for word in phrase)
            else:
                anaphor = ' '.join(word for word in phrase)
                modified_text[mention.sentenceIndex] = modified_text[mention.sentenceIndex].replace(anaphor, antecedent)

    modified_text = ' '.join(modified_text)

    return modified_text

text = 'Tom is a smart boy. He knows a lot of things.'
pronoun_resolution(text)

输出：“汤姆是个聪明的男孩。汤姆知道很多事情

相关问题更多 >

编程相关推荐

热门问题

热门文章