将实体ID映射到SpaCy 3.0中的字符串 - 问答 - Python中文网

将实体ID映射到SpaCy 3.0中的字符串

2024-09-29 01:38:46 发布

您现在位置：Python中文网/ 问答频道 /正文

男 | 程序猿一只，喜欢编程写python代码。

我已经使用spacy 3.0培训了一个简单的NER管道。培训后，我想从aDoc（doc = nlp(text)）获得一个预测的IOB标签列表。例如，["O", "O", "B", "I", "O"]

我可以使用

>> doc.to_array("ENT_IOB")
array([2, 2, ..., 2], dtype=uint64)

但是我如何获得映射/查找

我在doc.vocab.lookups.tables中没有找到任何查找表

我也知道，通过访问每个令牌（[token.ent_iob_ for token in doc]）上的ent_iob_可以达到同样的效果，但我想知道是否有更好的方法

Tags： to text token 列表 doc 管道 nlp spacy

1条回答

网友

1楼 · 发布于 2024-09-29 01:38:46

检查^{}文档：

ent_iob IOB code of named entity tag. 3 means the token begins an entity, 2 means it is outside an entity, 1 means it is inside an entity, and 0 means no entity tag is set.
ent_iob_ IOB code of named entity tag. “B” means the token begins an entity, “I” means it is inside an entity, “O” means it is outside an entity, and "" means no entity tag is set.

因此，您只需使用简单的iob_map = {0: "", 1: "I", 2: "O", 3: "B"}字典替换将ID映射到名称：

doc = nlp("John went to New York in 2010.")
print([x.text for x in doc.ents])
# => ['John', 'New York', '2010']
iob_map = {0: "", 1: "I", 2: "O", 3: "B"}
print(list(map(iob_map.get, doc.to_array("ENT_IOB").tolist())))
# => ['B', 'O', 'O', 'B', 'I', 'O', 'B', 'O']

相关问题更多 >

编程相关推荐

热门问题

热门文章