用于构建语言规则的dsl
rita-dsl的Python项目详细描述
丽塔DSL
这是一种语言,松散地基于语言Apache UIMA RUTA,专注于编写手动语言规则,这些规则编译成spaCy兼容的模式。这些模式可以用于manual NER,也可以用于其他过程,如重新编程和纯匹配
文档
Extending-注入要在规则生成中使用的自定义宏
快速启动
通过pip install rita-dsl
您可以通过创建扩展名为*.rita
下面是一个完整的示例,可以用作参考点
cars = LOAD("examples/cars.txt") # Load items from file
colors = {"red", "green", "blue", "white", "black"} # Declare items inline
{IN_LIST(colors), WORD("car")} -> MARK("CAR_COLOR") # If first token is in list `colors` and second one is word `car`, label it
{IN_LIST(cars), WORD+} -> MARK("CAR_MODEL") # If first token is in list `cars` and follows by 1..N words, label it
{ENTITY("PERSON"), LEMMA("like"), WORD} -> MARK("LIKED_ACTION") # If first token is Person, followed by any word which has lemma `like`, label it
现在您可以编译这些规则rita -f <your-file>.rita output.jsonl
并加载到spacy:
importspacyfromspacy.pipelineimportEntityRulernlp=spacy.load("en")ruler=EntityRuler(nlp,overwrite_ents=True)ruler.from_disk("output.jsonl")nlp.add_pipe(ruler)
每次用spacy解析文本时,它都会运行通常的工作流并应用这些规则
text="""Johny Silver was driving a red car. It was BMW X6 Mclass. Johny likes driving it very much."""doc=nlp(text)entities=[(e.text,e.label_)foreindoc.ents]print(entities)assertentities[0]==("Johny Silver","PERSON")# Normal NERassertentities[1]==("red car","CAR_COLOR")# Our first ruleassertentities[2]==("BMW X6 Mclass","CAR_MODEL")# Our second ruleassertentities[3]==("Johny likes driving","LIKED_ACTION")# Our third rule
另外,如果rita
在项目中用作依赖项,并且您更喜欢动态编译规则,则可以这样做:
importritaimportspacyfromspacy.pipelineimportEntityRulernlp=spacy.load("en")ruler=EntityRuler(nlp,overwrite_ents=True)patterns=rita.compile("examples/color-car.rita")ruler.add_patterns(patterns)nlp.add_pipe(ruler)