如何在Pylucene 8.6.1中创建自定义分析器?

2024-09-30 04:37:53 发布

您现在位置:Python中文网/ 问答频道 /正文

我已经看过thisthisthis了,但我不确定它们为什么不适合我

我通常会使用下面这样的分析仪

import lucene
from org.apache.lucene.analysis.core import WhitespaceAnalyzer
from org.apache.lucene.index import IndexWriterConfig, IndexWriter
from org.apache.lucene.store import SimpleFSDirectory
from java.nio.file import Paths
from org.apache.lucene.document import Document, Field, TextField

index_path = "./index"

lucene.initVM()

analyzer =  WhitespaceAnalyzer()
config = IndexWriterConfig(analyzer)
store = SimpleFSDirectory(Paths.get(index_path))
writer = IndexWriter(store, config)

doc = Document()
doc.add(Field("title", "The quick brown fox.",  TextField.TYPE_STORED))
writer.addDocument(doc)

writer.close()
store.close()

我想使用MyAnalyzer()而不是WhitespaceAnalyzer(),它应该有LowerCaseFilterWhitespaceTokenizer

from org.apache.lucene.analysis.core import LowerCaseFilter, WhitespaceTokenizer
from org.apache.pylucene.analysis import PythonAnalyzer

class MyAnalyzer(PythonAnalyzer):
    def __init__(self):
        PythonAnalyzer.__init__(self)

    def createComponents(self, fieldName):
        # What do I write here?

你能帮我写和使用MyAnalyzer()


Tags: storefromorgimportselfindexdocapache
1条回答
网友
1楼 · 发布于 2024-09-30 04:37:53

我发现herehere下面的方法有效

from org.apache.lucene.analysis.core import LowerCaseFilter, WhitespaceTokenizer
from org.apache.pylucene.analysis import PythonAnalyzer
from org.apache.lucene.analysis import Analyzer

class MyAnalyzer(PythonAnalyzer):
    def __init__(self):
        PythonAnalyzer.__init__(self)

    def createComponents(self, fieldName):
        source = WhitespaceTokenizer()
        result = LowerCaseFilter(source)
        return Analyzer.TokenStreamComponents(source, result)

如果有人能给我指出正确的方向,让我能够正确地找到这些答案,那就太好了

相关问题 更多 >

    热门问题