使用跨度对象。[spaCy,python]

2024-09-27 04:19:09 发布

您现在位置:Python中文网/ 问答频道 /正文

我不确定这是否真的是一个愚蠢的问题,但接下来是

text_corpus = '''Insurance bosses plead guilty\n\nAnother three US insurance executives have pleaded guilty to fraud charges stemming from an ongoing investigation into industry malpractice.\n\nTwo executives from American International Group (AIG) and one from Marsh & McLennan were the latest. The investigation by New York attorney general Eliot Spitzer has now obtained nine guilty pleas. The highest ranking executive pleading guilty on Tuesday was former Marsh senior vice president Joshua Bewlay.\n\nHe admitted one felony count of scheming to defraud and faces up to four years in prison. A Marsh spokeswoman said Mr Bewlay was no longer with the company. Mr Spitzer\'s investigation of the US insurance industry looked at whether companies rigged bids and fixed prices. Last month Marsh agreed to pay $850m (£415m) to settle a lawsuit filed by Mr Spitzer, but under the settlement it "neither admits nor denies the allegations".\n'''

def get_entities(document_text, model):
    analyzed_doc = model(document_text)
    entities = [entity for entity in analyzed_doc.ents if entity.label_ in ["PER", "ORG", "LOC", "GPE"]]
    return entities
model = spacy.load("en_core_web_sm")
entities_1 = get_entities(text_corpus, model)
entities_2 = get_entities(text_corpus, model)

但当它运行以下命令时

entities_1[0] in entities_2

输出为False

为什么呢?两个实体列表中的对象相同。然而,一个列表中的项目不在另一个列表中。这太奇怪了。有人能给我解释一下为什么会这样吗


Tags: andthetotextinfromgetmodel
1条回答
网友
1楼 · 发布于 2024-09-27 04:19:09

这是由于ents在spaCy中的表示方式。它们是具有特定实现的类,因此即使entities_2[0] == entities_1[0]也将计算为False。从外观上看,Span类没有__eq__的实现,至少乍一看,这是原因很简单

如果您打印出实体_2[0]的值,它将提供给您,但这只是因为span类在同一文件中实现了一个__repr__方法。如果要进行布尔比较,一种方法是使用Spantext属性并执行以下操作:

entities_1[0].text in [e.text for e in entities_2]

编辑:

正如@abb指出的,Span实现了__richcmp__,但是这适用于Span的相同实例,因为它检查令牌本身的位置

相关问题 更多 >

    热门问题