将实体ID映射到SpaCy 3.0中的字符串

2024-09-29 01:38:46 发布

您现在位置:Python中文网/ 问答频道 /正文

我已经使用spacy 3.0培训了一个简单的NER管道。培训后,我想从aDocdoc = nlp(text))获得一个预测的IOB标签列表。例如,["O", "O", "B", "I", "O"]

我可以使用

>> doc.to_array("ENT_IOB")
array([2, 2, ..., 2], dtype=uint64)

但是我如何获得映射/查找

我在doc.vocab.lookups.tables中没有找到任何查找表

我也知道,通过访问每个令牌([token.ent_iob_ for token in doc])上的ent_iob_可以达到同样的效果,但我想知道是否有更好的方法


Tags: totexttoken列表doc管道nlpspacy
1条回答
网友
1楼 · 发布于 2024-09-29 01:38:46

检查^{}文档:

  • ent_iob IOB code of named entity tag. 3 means the token begins an entity, 2 means it is outside an entity, 1 means it is inside an entity, and 0 means no entity tag is set.
  • ent_iob_ IOB code of named entity tag. “B” means the token begins an entity, “I” means it is inside an entity, “O” means it is outside an entity, and "" means no entity tag is set.

因此,您只需使用简单的iob_map = {0: "", 1: "I", 2: "O", 3: "B"}字典替换将ID映射到名称:

doc = nlp("John went to New York in 2010.")
print([x.text for x in doc.ents])
# => ['John', 'New York', '2010']
iob_map = {0: "", 1: "I", 2: "O", 3: "B"}
print(list(map(iob_map.get, doc.to_array("ENT_IOB").tolist())))
# => ['B', 'O', 'O', 'B', 'I', 'O', 'B', 'O']

相关问题 更多 >