电子健康档案概念标注工具

medcat的Python项目详细描述


医疗oncept注释工具

一个简单的工具,用于从umls或任何其他源中进行概念注释。

演示

演示应用程序位于MedCAT。请注意这是关于药物的训练 并且包含一小部分umls(<;1%)。

使用pip

安装
  1. 安装medcat

pip install --upgrade medcat

  1. 安装科学模型

pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.2.0/en_core_sci_md-0.2.0.tar.gz

  1. 从下面的“型号”部分下载词汇表和CDB

  2. 使用方法:

frommedcat.catimportCATfrommedcat.utils.vocabimportVocabfrommedcat.cdbimportCDBvocab=Vocab()# Load the vocab model you downloadedvocab.load_dict('<path to the vocab file>')# Load the cdb model you downloadedcdb=CDB()cdb.load_dict('<path to the cdb file>')# create catcat=CAT(cdb=cdb,vocab=vocab)cat.train=False# Test itdoc="My simple document with kidney failure"doc_spacy=cat(doc)# Entities are indoc_spacy._.ents# Or to get a jsondoc_json=cat.get_json(doc)# To have a look at the results:fromspacyimportdisplacy# Note that this will not show all entites, but only the longest onesdisplacy.serve(doc_spacy,style='ent')# To train - unsupervised, set the train flag to True and run#documents through MedCATcat.train=True# To run cat on a large number of documents, this will#also run trainnig as the flag is set to True.data=[(<doc_id>,<text>),(<doc_id>,<text>),...]docs=cat.multi_processing(data)# To explicitly run trainnig you can dof=open("<some file with a lot of medical text>",'r')# If you want fine tune set it to True, old training will be preservedcat.run_training(f,fine_tune=True)

建立新概念数据库

frommedcat.catimportCATfrommedcat.utils.vocabimportVocabfrommedcat.cdbimportCDBvocab=Vocab()# Load the vocab model you downloadedvocab.load_dict('<path to the vocab file>')# If you have an existing CDBcdb=CDB()cdb.load_dict('<path to the cdb file>')# You can now add concepts from a CSV file, examples of the files can be found in ./examplespreparator=PrepareCDB(vocab=vocab)csv_paths=['<path to your csv_file>','<another one>',...]# e.g.csv_paths=['./examples/simple_cdb.csv']cdb=preparator.prepare_csvs(csv_paths)# Save the new CDB for latercdb.save_dict("<path to a file where it will be saved>")# Done

如果是从源头建造,则要求

python >= 3.5

其余的都可以使用requirements.txt文件中的pip来安装,方法是运行:

pip install -r requirements.txt

结果

DatasetSoftF1Description
MedMentions0.84The whole MedMentions dataset without any modifications or supervised training
MedMentions0.828MedMentions only for concepts that require disambiguation, or names that map to more CUIs
MedMentions0.97Medmentions filterd by TUI to only concepts that are a disease

型号

为词汇表和cdb公开了一个基本的训练模型。它针对MedMentions中提供的~35k个概念进行培训。它是相当有限的 所以表演可能不是最好的。

词汇Download-根据med提到的内容构建

cdb Download-根据mednessions构建

(注意:这是根据mednessions编译的,没有来自NLMas的任何数据) 该数据不公开。)

确认

实体提取是在MedMentions上训练的,它总共有~35k个来自umls的实体

本词典由Wiktionary汇编而成,共有~800k个独特单词For now NOT made publicaly available

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
java如何在JUnit5中定义优先级   Web驱动程序将焦点切换到iframe的java困难   java JFileChooser没有文件名文本字段选项   本地化是否可以回退到Java中resourcebundle的宏语言(例如,nynorsk>norsk)   禁用时Java断言的性能拖动   未考虑执行中的java jsonschema2pojo maven插件配置   java微调器。setSelection未调用setOnItemSelectedListener函数   序列化XStream:序列化java的反序列化。sql。时间导致错误   java无法理解为什么“ajpnio8009execXX”线程在AbstractQueuedSynchronizer$ConditionObject上阻塞/等待时间。等候   Java date给我的格式是mm/dd/yyyy,其中jquery datepicker的日期格式是dd/mm/yyyy   jsf如何用javaweb应用程序在客户端重写csv文件   雅加达ee Java邮件Api,无法从outlook客户端读取“.msg附件”   java PreparedStatement性能调优