有 Java 编程相关的问题?

你可以在下面搜索框中键入要查询的问题!

java Stanford NLP注释文本非常慢

我正在使用斯坦福CoreNLP在Windows机器上运行Java的NLP项目。我想从这篇文章中注释一篇大型文本文章。我写的代码如下

Properties props = new Properties();
props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref, regexner");
StanfordCoreNLP pipeline =   new StanfordCoreNLP(props);
Annotation document = new Annotation("Text to be annotated. This text is very long!");
pipeline.annotate(document); // this line takes a long time

文本的注释占用了相当长的时间。 大约60个单词,这一行大约需要16秒,太长了

有没有办法加快这一进程,或者我有没有遗漏什么。 请告诉我我能做什么。 Thanx提前:-)

编辑

代码示例

    public TextReader() {
props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, regexner");
pipeline = new StanfordCoreNLP(props);
extractor = CoreMapExpressionExtractor.
                            createExtractorFromFiles(TokenSequencePattern.getNewEnv(), "Stanford NLP\\stanford-corenlp-full-2015-01-29\\stanford-corenlp-full-2015-01-30\\tokensregex\\color.rules.txt");
text = "Barak Obama was born on August 4, 1961,at Kapiolani Maternity & Gynecological Hospital "
+ " in Honolulu, Hawaii, and would become the first President to have been born in Hawaii. His mother, Stanley Ann Dunham,"
+ " was born in Wichita, Kansas, and was of mostly English ancestry. His father, Barack Obama, Sr., was a Luo from Nyang’oma"
+ " Kogelo, Kenya. He studied at the University of Westminster. His favourite colour is red.";
Logger.getLogger(TextReader.class.getName()).log(Level.INFO, "Annotator starting...", text); // LOG 1
Annotation document = new Annotation(text);
pipeline.annotate(document);
Logger.getLogger(TextReader.class.getName()).log(Level.INFO, "Annotator finished...", props); // LOG 2
sentences = document.get(SentencesAnnotation.class);
for (CoreMap sentence : sentences) {
   //the tokens of the sentence are taken and iterated over
   // the NER, POS and lemma of the tokens are stores iteratively
}
}

我意识到日志1和日志2之间的时间大约是16秒。我需要的是处理更长的文本,这需要很长时间。请告诉我我做错了什么

Thanx=D


共 (1) 个答案

  1. # 1 楼答案

    课文是一个长句吗?相对于句子的长度,解析器的运行时间是O(n^3),对于长度超过40个单词的句子,其运行速度非常慢。如果删除“parse,dcoref,regexner”注释器,它会加快速度吗?如果重新添加“parse”,是否会再次减慢速度

    如果您关心的是依赖项解析而不是选区解析,那么新的“depprase”注释器将更快地生成这些解析;尽管如此,我们的coref还不能处理依赖项解析(很快就会出现!)