Python medcat包_程序模块 - PyPI

电子健康档案概念标注工具

medcat的Python项目详细描述

医疗oncept注释工具

一个简单的工具，用于从umls或任何其他源中进行概念注释。

演示

演示应用程序位于MedCAT。请注意这是关于药物的训练并且包含一小部分umls（<；1%）。

使用pip

安装

安装medcat

pip install --upgrade medcat

安装科学模型

pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.2.0/en_core_sci_md-0.2.0.tar.gz

从下面的“型号”部分下载词汇表和CDB
使用方法：

frommedcat.catimportCATfrommedcat.utils.vocabimportVocabfrommedcat.cdbimportCDBvocab=Vocab()# Load the vocab model you downloadedvocab.load_dict('<path to the vocab file>')# Load the cdb model you downloadedcdb=CDB()cdb.load_dict('<path to the cdb file>')# create catcat=CAT(cdb=cdb,vocab=vocab)cat.train=False# Test itdoc="My simple document with kidney failure"doc_spacy=cat(doc)# Entities are indoc_spacy._.ents# Or to get a jsondoc_json=cat.get_json(doc)# To have a look at the results:fromspacyimportdisplacy# Note that this will not show all entites, but only the longest onesdisplacy.serve(doc_spacy,style='ent')# To train - unsupervised, set the train flag to True and run#documents through MedCATcat.train=True# To run cat on a large number of documents, this will#also run trainnig as the flag is set to True.data=[(<doc_id>,<text>),(<doc_id>,<text>),...]docs=cat.multi_processing(data)# To explicitly run trainnig you can dof=open("<some file with a lot of medical text>",'r')# If you want fine tune set it to True, old training will be preservedcat.run_training(f,fine_tune=True)

建立新概念数据库

frommedcat.catimportCATfrommedcat.utils.vocabimportVocabfrommedcat.cdbimportCDBvocab=Vocab()# Load the vocab model you downloadedvocab.load_dict('<path to the vocab file>')# If you have an existing CDBcdb=CDB()cdb.load_dict('<path to the cdb file>')# You can now add concepts from a CSV file, examples of the files can be found in ./examplespreparator=PrepareCDB(vocab=vocab)csv_paths=['<path to your csv_file>','<another one>',...]# e.g.csv_paths=['./examples/simple_cdb.csv']cdb=preparator.prepare_csvs(csv_paths)# Save the new CDB for latercdb.save_dict("<path to a file where it will be saved>")# Done

如果是从源头建造，则要求

python >= 3.5

其余的都可以使用requirements.txt文件中的pip来安装，方法是运行：

pip install -r requirements.txt

结果

Dataset	SoftF1	Description
MedMentions	0.84	The whole MedMentions dataset without any modifications or supervised training
MedMentions	0.828	MedMentions only for concepts that require disambiguation, or names that map to more CUIs
MedMentions	0.97	Medmentions filterd by TUI to only concepts that are a disease

型号

为词汇表和cdb公开了一个基本的训练模型。它针对MedMentions中提供的~35k个概念进行培训。它是相当有限的所以表演可能不是最好的。

词汇Download-根据med提到的内容构建

cdb Download-根据mednessions构建

（注意：这是根据mednessions编译的，没有来自NLMas的任何数据）该数据不公开。）

确认

实体提取是在MedMentions上训练的，它总共有~35k个来自umls的实体

本词典由Wiktionary汇编而成，共有~800k个独特单词For now NOT made publicaly available

欢迎加入QQ群-->： 979659372

medcat 0.2.3.3

medcat的Python项目详细描述

医疗oncept注释工具

演示

使用pip

建立新概念数据库

如果是从源头建造，则要求

结果

型号

确认

推荐PyPI第三方库

cubist

Rels

lins-unous

horusdemodlib

rpaframework-http

keix-message-store

textools

kessler

krate

TeleNex

RandoRank

zqygis

drfjwt

gen-name-szczep

hrflow

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

medcat 0.2.3.3

medcat的Python项目详细描述

医疗oncept注释工具

演示

使用pip

建立新概念数据库

如果是从源头建造，则要求

结果

型号

确认

推荐PyPI第三方库

cubist

Rels

lins-unous

horusdemodlib

rpaframework-http

keix-message-store

textools

kessler

krate

TeleNex

RandoRank

zqygis

drfjwt

gen-name-szczep

hrflow

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签