用于构建语言规则的dsl

rita-dsl的Python项目详细描述


丽塔DSL

这是一种语言,松散地基于语言Apache UIMA RUTA,专注于编写手动语言规则,这些规则编译成spaCy兼容的模式。这些模式可以用于manual NER,也可以用于其他过程,如重新编程和纯匹配

文档

快速启动

通过pip install rita-dsl

安装

您可以通过创建扩展名为*.rita

的文件来开始定义规则。

下面是一个完整的示例,可以用作参考点

cars = LOAD("examples/cars.txt") # Load items from file
colors = {"red", "green", "blue", "white", "black"} # Declare items inline

{IN_LIST(colors), WORD("car")} -> MARK("CAR_COLOR") # If first token is in list `colors` and second one is word `car`, label it

{IN_LIST(cars), WORD+} -> MARK("CAR_MODEL") # If first token is in list `cars` and follows by 1..N words, label it

{ENTITY("PERSON"), LEMMA("like"), WORD} -> MARK("LIKED_ACTION") # If first token is Person, followed by any word which has lemma `like`, label it

现在您可以编译这些规则rita -f <your-file>.rita output.jsonl

并加载到spacy:

importspacyfromspacy.pipelineimportEntityRulernlp=spacy.load("en")ruler=EntityRuler(nlp,overwrite_ents=True)ruler.from_disk("output.jsonl")nlp.add_pipe(ruler)

每次用spacy解析文本时,它都会运行通常的工作流并应用这些规则

text="""Johny Silver was driving a red car. It was BMW X6 Mclass. Johny likes driving it very much."""doc=nlp(text)entities=[(e.text,e.label_)foreindoc.ents]print(entities)assertentities[0]==("Johny Silver","PERSON")# Normal NERassertentities[1]==("red car","CAR_COLOR")# Our first ruleassertentities[2]==("BMW X6 Mclass","CAR_MODEL")# Our second ruleassertentities[3]==("Johny likes driving","LIKED_ACTION")# Our third rule

另外,如果rita在项目中用作依赖项,并且您更喜欢动态编译规则,则可以这样做:

importritaimportspacyfromspacy.pipelineimportEntityRulernlp=spacy.load("en")ruler=EntityRuler(nlp,overwrite_ents=True)patterns=rita.compile("examples/color-car.rita")ruler.add_patterns(patterns)nlp.add_pipe(ruler)

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
尝试运行JFLAP。戴软呢帽的罐子23。Java正在抛出异常   无引用的java数组布尔复制   hibernate如何在java SE应用程序中使用JPA EntityManager   java如何使用ORMLite在SQLite中持久化JavaFX属性?   java无法将项目部署到GAE   java:谷歌地图维基百科层   java Resultset(getter/setter类)对象在第二次执行时未删除旧值   s中的java struts2:选择列表>请求的列表键“”作为集合/数组/映射/枚举/迭代器类型   java如何在Karaf 4.0.5中获得BaseDao中的entityManager?   java VSCode未从控制台读取西里尔文   java字体。createFromAsset()返回字体的空指针异常   java错误:将Android Studio从0.6.1更新到0.8.9后,没有合适的构造函数