java Stanford NLP注释文本非常慢

2 周 Questions & Answers 2590

我正在使用斯坦福CoreNLP在Windows机器上运行Java的NLP项目。我想从这篇文章中注释一篇大型文本文章。我写的代码如下

Properties props = new Properties();
props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref, regexner");
StanfordCoreNLP pipeline =   new StanfordCoreNLP(props);
Annotation document = new Annotation("Text to be annotated. This text is very long!");
pipeline.annotate(document); // this line takes a long time

文本的注释占用了相当长的时间。大约60个单词，这一行大约需要16秒，太长了

有没有办法加快这一进程，或者我有没有遗漏什么。请告诉我我能做什么。 Thanx提前：-）

编辑

代码示例

    public TextReader() {
props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, regexner");
pipeline = new StanfordCoreNLP(props);
extractor = CoreMapExpressionExtractor.
                            createExtractorFromFiles(TokenSequencePattern.getNewEnv(), "Stanford NLP\\stanford-corenlp-full-2015-01-29\\stanford-corenlp-full-2015-01-30\\tokensregex\\color.rules.txt");
text = "Barak Obama was born on August 4, 1961,at Kapiolani Maternity & Gynecological Hospital "
+ " in Honolulu, Hawaii, and would become the first President to have been born in Hawaii. His mother, Stanley Ann Dunham,"
+ " was born in Wichita, Kansas, and was of mostly English ancestry. His father, Barack Obama, Sr., was a Luo from Nyang’oma"
+ " Kogelo, Kenya. He studied at the University of Westminster. His favourite colour is red.";
Logger.getLogger(TextReader.class.getName()).log(Level.INFO, "Annotator starting...", text); // LOG 1
Annotation document = new Annotation(text);
pipeline.annotate(document);
Logger.getLogger(TextReader.class.getName()).log(Level.INFO, "Annotator finished...", props); // LOG 2
sentences = document.get(SentencesAnnotation.class);
for (CoreMap sentence : sentences) {
   //the tokens of the sentence are taken and iterated over
   // the NER, POS and lemma of the tokens are stores iteratively
}
}

我意识到日志1和日志2之间的时间大约是16秒。我需要的是处理更长的文本，这需要很长时间。请告诉我我做错了什么

Thanx=D

Python中文网

有 Java 编程相关的问题?

java Stanford NLP注释文本非常慢

共 (1) 个答案

# 1 楼答案