带呼啸声的深层NLP管道

2024-09-24 02:17:42 发布

您现在位置:Python中文网/ 问答频道 /正文

我对NLP和IR程序很陌生。我正在尝试实现一个深层的NLP管道,即在句子索引中添加词法化、依赖分析功能。下面是我的模式和搜索程序。在

my_analyzer = RegexTokenizer()| StopFilter()| LowercaseFilter() | StemFilter() | Lemmatizer()
    pos_analyser = RegexTokenizer() | StopFilter()| LowercaseFilter() | PosTagger()
    schema = Schema(id=ID(stored=True, unique=True), stem_text=TEXT(stored= True, analyzer=my_analyzer), pos_tag= pos_analyser)

for sentence in sent_tokenize_list1:
    writer.add_document(stem_text = sentence, pos_tag = sentence)
for sentence in sent_tokenize_list2:
    writer.add_document(stem_text = sentence, pos_tag = sentence)
writer.commit()
with ix.searcher() as searcher:
    og = qparser.OrGroup.factory(0.9)
    query_text = MultifieldParser(["stem_text","pos_tag"], schema = ix.schema, group= og).parse(
        "who is controlling the threat of locusts?")
     results = searcher.search(query_text, sortedby= scores, limit = 10 )

这是自定义分析器。在

^{pr2}$

我得到以下错误。在

whoosh.fields.FieldConfigurationError: CompositeAnalyzer(RegexTokenizer(expression=re.compile('\w+(\.?\w+)*'), gaps=False), StopFilter(stops=frozenset({'for', 'will', 'tbd', 'with', 'and', 'the', 'if', 'it', 'by', 'is', 'are', 'this', 'as', 'when', 'us', 'or', 'from', 'yet', 'you', 'have', 'can', 'be', 'we', 'of', 'to', 'on', 'a', 'an', 'your', 'at', 'in', 'may', 'not', 'that'}), min=2, max=None, renumber=True), LowercaseFilter(), PosTagger(cache={})) is not a FieldType object

我做错事了吗?这是在搜索引擎中添加NLP管道的正确方法吗?在


Tags: textinpostruefornlpschematag
1条回答
网友
1楼 · 发布于 2024-09-24 02:17:42

pos_tag应分配给字段TEXT(stored= True, analyzer=pos_analyzer),而不是直接分配给pos_analyser。在

所以在schema中,你应该有:

schema = Schema(id=ID(stored=True, unique=True), stem_text=TEXT(stored= True, analyzer=my_analyzer), post_tag=TEXT(stored= True, analyzer=pos_analyzer))

相关问题 更多 >