Python eleve包_程序模块 - PyPI

基于熵变化的熵词典提取

eleve的Python项目详细描述

什么是艾琳？

eleve是一个库，用于计算文本语料库中的子串（所有n个g）的“自治估计”分数。

自主性得分是基于字符串分支熵（nvbe）的标准化变化，有关这些术语的定义，请参见[MagistrySagot2012]。

它主要是为汉语的无监督切分而开发的，但它是独立于语言的，并成功地应用于诸如关键词提取等其他任务的研究。

完整文档可在http://pythonhosted.org/eleve/上获得。

简而言之

这里有一个简单的“开始”。首先你必须训练一个模特：

>>> from eleve import MemoryStorage
>>>
>>> storage = MemoryStorage()
>>>
>>> # Then the training itself:
>>> storage.add_sentence(["I", "like", "New", "York", "city"])
>>> storage.add_sentence(["I", "like", "potatoes"])
>>> storage.add_sentence(["potatoes", "are", "fine"])
>>> storage.add_sentence(["New", "York", "is", "a", "fine", "city"])

然后您可以cat查询它：

>>> storage.query_autonomy(["New", "York"])
2.0369977951049805
>>> storage.query_autonomy(["like", "potatoes"])
-0.3227022886276245

eleve还存储n-gram的发生计数：

>>> storage.query_count(["New", "York"])
2
>>> storage.query_count(["New", "potatoes"])
0
>>> storage.query_count(["I", "like", "potatoes"])
1
>>> storage.query_count(["potatoes"])
2

然后，您可以使用它进行分割，使用AlgOnthTm查找最大化NVBE的解决方案：

>>> from eleve import Segmenter
>>> s = Segmenter(storage)
>>> # segment up to 4-grams, if we used the same storage as before.
>>>
>>> s.segment(["What", "do", "you", "know", "about", "New", "York"])
[['What'], ['do'], ['you'], ['know'], ['about'], ['New', 'York']]

安装

你需要一些依赖关系。在ubuntu上：

$ sudo apt-get install python3-dev libboost-python-dev libboost-filesystem-dev libleveldb-dev

然后安装eleve:

$ pip install eleve

或者如果您有源文件夹的本地克隆：

$ python setup.py install

获取源

源存储在github：

$ git clone https://github.com/kodexlab/eleve

贡献

安装开发环境：

$ git clone https://github.com/kodexlab/eleve
$ cd eleve
$ virtualenv ENV -p /usr/bin/python3
$ source ENV/bin/activate
$ pip install -r requirements.txt
$ pip install -r requirements.dev.txt

欢迎拉取请求！

运行测试：

$ make testall

创建文档：

$ make doc

然后打开：docs/_build/html/index.html

警告：您需要在python路径中具有eleve可访问性才能运行测试（并生成doc）。为此，您可以在本地virtualenv中安装eleve作为链接：

$ pip install -e .

（注意：这在pytestgood practice中指明）

参考文献

如果您使用eleve作为学术出版物，请引用本文：

[MagistrySagot2012]

Magistry, P., & Sagot, B. (2012, July). Unsupervized word segmentation: the case for mandarin chinese. In Proceedings of the 50th Annual Meeting of the ACL: Short Papers-Volume 2 (pp. 383-387). http://www.aclweb.org/anthology/P12-2075

版权、许可和作者

eleve在LGPL Version 3许可证下可用。

eleve最初是由Pierre Magistry在攻读博士学位期间设计和原型的。然后由Korantin Auguste和Emmanuel Navarro（在皮埃尔的帮助下）完全重写。

欢迎加入QQ群-->： 979659372

eleve 19.2

eleve的Python项目详细描述

什么是艾琳？

简而言之

安装

获取源

贡献

参考文献

版权、许可和作者

推荐PyPI第三方库

ml4bio

switchboard-python

kaskara

annotationfactor

smokeur-cli

migrant

cocoapods-graph

signing_clients

gumbo

isisdm

tenho-dito

django-quantity-field

fs-watcher

collective.portlet.localevents

s3-obj-semver

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

eleve 19.2

eleve的Python项目详细描述

什么是艾琳？

简而言之

安装

获取源

贡献

参考文献

版权、许可和作者

推荐PyPI第三方库

ml4bio

switchboard-python

kaskara

annotationfactor

smokeur-cli

migrant

cocoapods-graph

signing_clients

gumbo

isisdm

tenho-dito

django-quantity-field

fs-watcher

collective.portlet.localevents

s3-obj-semver

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签