java定制Solr TokenFilter lemmatizer
我试图编写一个简单的Solr lemmatizer,用于字段类型,但我似乎找不到任何关于编写令牌过滤器的信息,所以我有点迷路了。这是我目前掌握的代码
import java.io.IOException;
import java.util.List;
import org.apache.lucene.analysis.TokenFilter;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
import org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
class FooFilter extends TokenFilter {
private static final Logger log = LoggerFactory.getLogger(FooFilter.class);
private final CharTermAttribute termAtt = addAttribute(CharTermAttribute.class);
private final PositionIncrementAttribute posAtt = addAttribute(PositionIncrementAttribute.class);
public FooFilter(TokenStream input) {
super(input);
}
@Override
public boolean incrementToken() throws IOException {
if (!input.incrementToken()) {
return false;
}
char termBuffer[] = termAtt.buffer();
List<String> allForms = Lemmatize.getAllForms(new String(termBuffer));
if (allForms.size() > 0) {
for (String word : allForms) {
// Now what?
}
}
return true;
}
}
# 1 楼答案
接下来,你想用你的单词
replace
或append
当前的标记termAtt
示例替换语义
添加新标记的示例语义
对于要添加的每个标记,必须设置
CharTermAttribute
属性,并且incrementToken
例程返回true