<p>这里有一个可能的解决方案,它使用CoreNLP输出的数据结构。提供所有信息。这不是一个完整的解决方案,可能需要扩展来处理所有情况,但这是一个很好的起点。在</p>
<pre><code>from pycorenlp import StanfordCoreNLP
nlp = StanfordCoreNLP('http://localhost:9000')
def resolve(corenlp_output):
""" Transfer the word form of the antecedent to its associated pronominal anaphor(s) """
for coref in corenlp_output['corefs']:
mentions = corenlp_output['corefs'][coref]
antecedent = mentions[0] # the antecedent is the first mention in the coreference chain
for j in range(1, len(mentions)):
mention = mentions[j]
if mention['type'] == 'PRONOMINAL':
# get the attributes of the target mention in the corresponding sentence
target_sentence = mention['sentNum']
target_token = mention['startIndex'] - 1
# transfer the antecedent's word form to the appropriate token in the sentence
corenlp_output['sentences'][target_sentence - 1]['tokens'][target_token]['word'] = antecedent['text']
def print_resolved(corenlp_output):
""" Print the "resolved" output """
possessives = ['hers', 'his', 'their', 'theirs']
for sentence in corenlp_output['sentences']:
for token in sentence['tokens']:
output_word = token['word']
# check lemmas as well as tags for possessive pronouns in case of tagging errors
if token['lemma'] in possessives or token['pos'] == 'PRP$':
output_word += "'s" # add the possessive morpheme
output_word += token['after']
print(output_word, end='')
text = "Tom and Jane are good friends. They are cool. He knows a lot of things and so does she. His car is red, but " \
"hers is blue. It is older than hers. The big cat ate its dinner."
output = nlp.annotate(text, properties= {'annotators':'dcoref','outputFormat':'json','ner.useSUTime':'false'})
resolve(output)
print('Original:', text)
print('Resolved: ', end='')
print_resolved(output)
</code></pre>
<p>这将产生以下输出:</p>
^{pr2}$
<p>正如你所看到的,当代词有一个句子首字母(标题大小写)先行词时,这个解决方案并不处理大小写的更正(“大猫咪”而不是最后一个句子中的“大猫”)。这取决于先行词的类别-普通名词先行词需要小写,而专有名词先行词则不需要。
其他一些特殊的处理可能是必要的(比如我测试句子中的所有格)。它还假定您不希望重用原始输出标记,因为它们是由此代码修改的。一种解决方法是复制原始数据结构或创建一个新属性,并相应地更改<code>print_resolved</code>函数。
纠正任何分辨率错误也是另一个挑战!在</p>