Python fr-word-segment包_程序模块 - PyPI

从语义上拆分错字的包

fr-word-segment的Python项目详细描述

法语分词

通常在从开源ocr（如tesseract）中提取文本时，由于ocr质量的提取，我们很可能会遇到链接词。

例如：与其提取“tr_s bon service”，人们可能会突然获得“tr_s bonservice”。因此，在使用bow、tfidf甚至word2vec模型进行特征工程时，算法会将“bonservice”视为一个唯一的特征，而不是。

为了解决这个问题，我建立了一个模块来处理语义分词，而不需要任何预定义的语料库。

安装

使用包管理器pip安装fr_word_段。

pip3 install fr-word-segment
python3 -m spacy download fr

用法

fromfr_word_segmentimportwordseg# suppose that a french spellchecker detect this token as misspelledtoken="soitmoinscompliqué"# apply segmentation function on the given tokenresult=wordseg.segment_token(token)# show resultsprint("raw token is {}".format(token))# "soitmoinscompliqué"print("processed token is {}".format(result))# "soit moins compliqué"

贡献

欢迎拉取请求。对于重大变更，请先打开一个问题来讨论您希望更改的内容。

许可证

MIT

欢迎加入QQ群-->： 979659372

fr-word-segment 0.1.3

fr-word-segment的Python项目详细描述

法语分词

安装

用法

贡献

许可证

推荐PyPI第三方库

neuralee

kanilog

collective.portlet.globalnav

anycall

schemapi

torch-multi-head-attention

loggui

typedtsv

zc.recipe.zope3instance

compoctl

pyobjc-framework-StoreKit

ga-secret-generator

oem-format-minimize-msgpack

polyform

news-please

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

fr-word-segment 0.1.3

fr-word-segment的Python项目详细描述

法语分词

安装

用法

贡献

许可证

推荐PyPI第三方库

neuralee

kanilog

collective.portlet.globalnav

anycall

schemapi

torch-multi-head-attention

loggui

typedtsv

zc.recipe.zope3instance

compoctl

pyobjc-framework-StoreKit

ga-secret-generator

oem-format-minimize-msgpack

polyform

news-please

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签