Python bloatectom包_程序模块 - PyPI

臃肿切除术：一种识别和删除电子健康记录和其他文件中膨胀的文字的方法。

bloatectom的Python项目详细描述

肿胀切除术

臃肿切除术：一种识别和删除电子健康记录和其他文件中膨胀的文字的方法。接受要标记为重复的注释列表或单个文件（.docx、.txt、.rtf等）或单个字符串。标记的输出和标记是输出。在

要求

Python>；=3.7.x（为了使正则表达式正常工作）
关于
系统
熊猫（可选，仅当使用模拟III数据时才需要）
docx（可选，仅当输入或输出是word/docx文件时才需要）

安装

使用水蟒或迷你水蟒

conda install -c summerkrankin bloatectomy

通过PyPI使用pip
如果默认值是python2，请确保将其安装到python3

^{pr2}$

通过github使用pip

python3 -m pip install git+git://github.com/MIT-LCP/bloatectomy

通过克隆存储库手动安装

git clone git://github.com/MIT-LCP/bloatectomy
cd bloatectomy
python3 setup.py install

示例

使用以下选项对示例字符串运行bloatecution：

突出显示重复项
显示原始结果
以html格式输出文件
带编号标记的输出文件：

from bloatectomy import bloatectomy

text = '''Assessment and Plan
61 yo male Hep C cirrhosis
Abd pain:
-other labs: PT / PTT / INR:16.6//    1.5, CK / CKMB /
ICU Care
-other labs: PT / PTT / INR:16.6//    1.5, CK / CKMB /
Assessment and Plan
'''

bloatectomy(text, style='highlight', display=True, filename='sample_txt_highlight_output', output='html', output_numbered_tokens=True)

要使用示例文本或加载ipynb示例，请下载存储库或仅下载bloatecution_examples文件夹

cd bloatectomy_examples
from bloatectomy import bloatectomy

bloatectomy('./input/sample_text.txt',
            style='highlight', display=False,
            filename='./output/sample_txt_highlight_output',
            output='html',
            output_numbered_tokens=True,
            output_original_tokens=True)

文件

本文位于TBA

class bloatectomy(input_text,
                  path = '',
                  filename='bloatectomized_file',
                  display=False,
                  style='highlight',
                  output='html',
                  output_numbered_tokens=False,
                  output_original_tokens=False,
                  regex1=r"(.+?\.[\s\n]+)",
                  regex2=r"(?=\n\s*[A-Z1-9#-]+.*)",
                  postgres_engine=None,
                  postgres_table=None)

参数

input_text：文件，str，list
输入文档（.txt、.rtf、.docx）、文本字符串或postgres MIMICII数据库或原始文本的hadm_ID列表。在

style：str，可选，默认值=highlight
表示重复项的方法。允许以下内容：highlight，bold，remov。在

filename：str，可选，默认值=bloatectomized_file 一个字符串，用于命名已消除膨胀的文档的输出文件。在

path：str，可选，默认值=' '
输出文件的目录。在

output_numbered_tokens：bool，可选，默认值=False
如果设置为True，则将以[filename]_token_numbers.txt的形式输出一个.txt文件，其中每个标记都被枚举并标记为重复。这在诊断您自己的正则表达式进行标记化或测试style的remov选项时非常有用。在

output_original_tokens：bool，可选，默认值=False
如果设置为True，则将以[filename]_original_token_numbers.txt的形式输出一个.txt文件，其中每个原始（未标记）标记为枚举但未标记为重复。在

display：bool，可选，默认值=False
如果设置为True，完成后膨胀的文本将显示在控制台中。在

regex1：str，可选，默认值=r"(.+?\.[\s\n]+)"
第一个标记化的正则表达式。在句点（.）上拆分，后跟一个或多个空白字符（空格、制表符、换行符）或换行符（\n）。这可以替换为任何有效的正则表达式，以更改令牌的创建方式。在

regex2：str，可选，默认值=r"(?=\n\s*[A-Z1-9#-]+.*)"
第二个标记化的正则表达式。拆分任何换行字符（\n），后跟大写字母、数字或破折号。这可以替换为任何有效的正则表达式来更改子标记的创建方式。在

postgres_引擎：str，可选 postgres连接。仅适用于模拟III数据集。当从postgres中提取数据时，文件的hadm_id将附加到filename中（如果设置）或默认值bloatectomized_file。示例代码请参见jupyter笔记本mimic_bloatectomy_example。在

postgres_table：str，可选包含连接注释的postgres表的名称。仅适用于模拟III数据集。当从postgres中提取数据时，文件的hadm_id将附加到filename中（如果设置）或默认值bloatectomized_file。示例代码请参见jupyter笔记本mimic_bloatectomy_example。在

欢迎加入QQ群-->： 979659372

bloatectomy 0.0.12

bloatectom的Python项目详细描述

肿胀切除术

要求

安装

示例

文件

参数

推荐PyPI第三方库

coopr.doc

cfn_resource

cleanm

mlearn

django-filepreviewfields

odoo12-addon-account-brand

bdtdecimaltowordsconverter

dwl

sbucket

django-oscar-amazon-payments

django-thesaurus

ezoutlet

rlock

stomp

scopes

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

bloatectomy 0.0.12

bloatectom的Python项目详细描述

肿胀切除术

要求

安装

示例

文件

参数

推荐PyPI第三方库

coopr.doc

cfn_resource

cleanm

mlearn

django-filepreviewfields

odoo12-addon-account-brand

bdtdecimaltowordsconverter

dwl

sbucket

django-oscar-amazon-payments

django-thesaurus

ezoutlet

rlock

stomp

scopes

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签