使用CoreNLP不能将句子分开

2024-06-26 02:45:05 发布

您现在位置:Python中文网/ 问答频道 /正文

复习课文如下:

"The tutu's was for my neice... She LOVED IT!!! It fit well and will fit her for some time with the elastic waist.... great quality and very inexpensive! I would buy her another easily."

并将其发送到CoreNLP服务器:

properties = {
    "tokenize.whitespace": "true",
    "annotators": "tokenize, ssplit, pos, lemma, ner, parse",
    "outputFormat": "json"
}


if not isinstance(paragraph, str):
    paragraph = unicodedata.normalize('NFKD', paragraph).encode('ascii', 'ignore')

result = self.nlp.annotate(paragraph, properties=properties)

给我这个结果:

{  
   u'sentences':[  
      {  
         u'parse':u'SENTENCE_SKIPPED_OR_UNPARSABLE',
         u'index':0,
         u'tokens':[  
            {  
               u'index':1,
               u'word':u'The',
               u'lemma':u'the',
               u'pos':u'DT',
               u'characterOffsetEnd':3,
               u'characterOffsetBegin':0,
               u'originalText':u'The'
            },
            {  
               u'index':2,
               u'word':u"tutu's",
               u'lemma':u"tutu'",
               u'pos':u'NNS',
               u'characterOffsetEnd':10,
               u'characterOffsetBegin':4,
               u'originalText':u"tutu's"
            },
            // ...
            {  
               u'index':34,
               u'word':u'easily.',
               u'lemma':u'easily.',
               u'pos':u'NN',
               u'characterOffsetEnd':187,
               u'characterOffsetBegin':180,
               u'originalText':u'easily.'
            }
         ]
      }
   ]
}

我注意到句子没有分开-你知道问题出在哪里吗?你知道吗

如果我使用的是http://localhost:9000webinterface,那么我会看到这些句子被正确地拆分。。你知道吗


Tags: andtheposforindexpropertiesfitword
1条回答
网友
1楼 · 发布于 2024-06-26 02:45:05

不知道为什么,但问题似乎来自tokenize.whitespace。我只是评论了一下:

properties = {
    #"tokenize.whitespace": "true",
    "annotators": "tokenize, ssplit, pos, lemma, ner, parse",
    "outputFormat": "json"
}

相关问题 更多 >