Python hyperhyper包_程序模块 - PyPI

为小数据构造单词嵌入的Python库

hyperhyper的Python项目详细描述

超超

Python库为小数据构造单词嵌入。仍在进行中。在

建立在Omer Levy等人为Hyperwords工作的基础上。在

为什么？在

现在，word embeddings大多与Word2vec或{a7}联系在一起。这些方法侧重于有大量数据可用的场景。但要使它们发挥作用，还需要大量数据。情况并非总是如此。在计算词对的基础上，还有一些关于矩阵运算的数学魔术的替代方法。他们需要更少的数据。这个Python库（在某种程度上）有效地实现了这些方法（但仍有改进的余地）

hyperhyper基于2015年的a paper。作者，Omer Levy等人，发表了他们的研究代码Hyperwods。我tried将他们的原始软件移植到python3，但我最终重写了它的大部分内容。所以这个图书馆诞生了。在

限制：使用hyperhyper时，如果您需要大量的词汇表（可能的单词集），您将遇到（内存）问题。如果你的词汇量不超过50k就没问题了。Word2vec和fastText特别解决这个问题curse of dimensionality。在

安装

pip install hyperhyper

如果您有Intel CPU，建议使用numpy的MKL库。正确设置MKL是一个挑战。英特尔的软件包可能会对你有所帮助。在

^{pr2}$

验证是否存在mkl_info：

>>>importnumpy>>>numpy.__config__.show()

禁用MKL或OpenBLAS的内部多线程功能。在

exportOPENBLAS_NUM_THREADS=1exportMKL_NUM_THREADS=1

这加快了计算速度，因为我们在外循环上使用多处理。在

使用

importhyperhyperashycorpus=hy.Corpus.from_file('news.2010.en.shuffled')bunch=hy.Bunch("news_bunch",corpus)vectors,results=bunch.svd(keyed_vectors=True)results['results'][1]>>>{'name':'en_ws353','score':0.6510955349164682,'oov':0.014164305949008499,'fullscore':0.641873218557878}vectors.most_similar('berlin')>>>[('vienna',0.6323208808898926),('frankfurt',0.5965485572814941),('munich',0.5737138986587524),('amsterdam',0.5511572360992432),('stockholm',0.5423270463943481)]

更多信息请参见examples。在

一般概念：

对数据进行一次预处理并将其保存到bunch
缓存所有结果并记录它们在测试数据上的性能
为您的数据轻松微调参数

可能会有更多的文件。在此之前，你必须阅读source code。在

科学背景

本软件基于以下文件：

从单词嵌入中获得的经验提高分布相似性，Omer Levy，Yoav Goldberg，Ido Dagan，TACL 2015。Paper Code
Recent trends suggest that neural-network-inspired word embedding models outperform traditional count-based distributional models on word similarity and analogy detection tasks. We reveal that much of the performance gains of word embeddings are due to certain system design choices and hyperparameter optimizations, rather than the embedding algorithms themselves. Furthermore, we show that these modifications can be transferred to traditional distributional models, yielding similar gains. In contrast to prior reports, we observe mostly local or insignificant performance differences between the methods, with no global advantage to any single approach over the others.
在
下采样策略对SVD单词嵌入稳定性的影响，Johannes Hellrich，Bernd Kampe，Udo Hahn，NAACL 2019。Paper Code Code
The stability of word embedding algorithms, i.e., the consistency of the word representations they reveal when trained repeatedly on the same data set, has recently raised concerns. We here compare word embedding algorithms on three corpora of different sizes, and evaluate both their stability and accuracy. We find strong evidence that down-sampling strategies (used as part of their training procedures) are particularly influential for the stability of SVD-PPMI-type embeddings. This finding seems to explain diverging reports on their stability and lead us to a simple modification which provides superior stability as well as accuracy on par with skip-gram embedding
在

发展

安装pipenv。在
git clone https://github.com/jfilter/hyperhyper && cd hyperhyper && pipenv install && pipenv shell
python -m spacy download en_core_web_sm
pytest tests

贡献

如果您有一个问题，发现一个错误，或者想提出一个新的功能，请看一下issues page。在

当修复错误或提高代码质量时，Pull请求尤其受欢迎。在

未来工作/待办事项

类比评价
如果pipenv还没有发布任何新版本，请更换pipenv
用更高效的编程语言实现计数，例如Cython。在

为什么这个库命名为`hyperhyper`？在

许可证

BSD-2-条款。在

赞助

这项工作是作为project的一部分而创作的，该项目由德国人Federal Ministry of Education and Research资助。在

欢迎加入QQ群-->： 979659372

hyperhyper 0.1.1

hyperhyper的Python项目详细描述

超超

为什么？在

安装

使用

科学背景

发展

贡献

未来工作/待办事项

为什么这个库命名为`hyperhyper`？在

许可证

赞助

推荐PyPI第三方库

pyscaleio

pycopy-cpython-gc

bottle-beaker

odoo8-addon-account-cutoff-prepaid

apt-archive-tools

aiolifxc

Spreadsheet-HTML

har2case

am

Zantedeschia

mistral-dashboard

dev-pipeline-configure

ghlocalapi

leicacam

webcrawler

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

hyperhyper 0.1.1

hyperhyper的Python项目详细描述

超超

为什么？在

安装

使用

科学背景

发展

贡献

未来工作/待办事项

为什么这个库命名为hyperhyper？在

许可证

赞助

推荐PyPI第三方库

pyscaleio

pycopy-cpython-gc

bottle-beaker

odoo8-addon-account-cutoff-prepaid

apt-archive-tools

aiolifxc

Spreadsheet-HTML

har2case

am

Zantedeschia

mistral-dashboard

dev-pipeline-configure

ghlocalapi

leicacam

webcrawler

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

为什么这个库命名为`hyperhyper`？在

导航栏

项目链接

标签