Python seaqube包_程序模块 - PyPI

单词嵌入的语义质量基准，即Python中的自然语言模型。简称是“SeaQuBe”或“SeaQuBe”。简单地叫它“|ˈsi:kjuːb |”

seaqube的Python项目详细描述

海库贝

单词嵌入的语义质量基准，即Python中的自然语言模型。缩略语SeaQuBe或seaqube。在

简介

该软件包的思想是提供文本数据扩充策略，以提高语义词嵌入质量。一些文本扩充策略已经可用并且适合于这个包的使用：https://github.com/makcedward/nlpaug。在

但是，此包还提供了

安装

SeaQuBe可以使用：pip install seaqube从PyPip安装。在

外部库：

待办事项：nlp.model.wv.dict['index2word']=nlp.model.vocabs（）

在nlp.model.wv.dict['vectors']=nlp.model.matrix（）

以及：

Python

import nltk; nltk.download('wordnet')

文字嵌入质量

提供的标准数据集来自：https://github.com/vecto-ai

使用NLP Loader生成的模型：

——>；这样做更容易 nlp=SeaQuBeNLPLoader.load_model_from_tin_can(模型.get（），“小型”型号） nlp（“高”）

基本示例

另请参阅：examples/basic\u augmentation jupyter notebook

导入所有增强方法

fromseaqube.augmentation.wordimportActive2PassiveAugmentation,EDAAugmentation,TranslationAugmentation,EmbeddingAugmentationfromseaqube.augmentation.charimportQwertyAugmentationfromseaqube.augmentation.corpusimportUnigramAugmentationfromseaqube.tools.ioimportload_json

准备语料库和样本数据

^{pr2}$

设置所有增强：

一个（实验性的）主动到被动语音转换器。只有一句话/doc对另一句话。

a2p=Active2PassiveAugmentation()

简单的数据扩充方法实现（随机字交换、插入、删除和同义词替换）。

eda=EDAAugmentation(max_length=2)

将文本翻译成其他语言并返回（使用谷歌翻译程序）。

translate=TranslationAugmentation(max_length=2)

使用另一个单词嵌入将单词替换为相似的单词。

嵌入=嵌入增强（最大长度=2）


###### insert typos on text based on a qwerty-keyboard
````python
qwerty = QwertyAugmentation(replace_rate=0.07, max_length=2)

在UDA算法的基础上，只采用Unigram方法，用其他低义词代替低义词。这种方法需要一个语料库，因为它需要检测低意义的完整单词

unigram=UnigramAugmentation(corpus=corpus,max_length=2)

API-用法

每个增强对象都有相同的可能性

# 1. augmenting a string - same syntax as NLPAUG (https://github.com/makcedward/nlpaug)print(qwerty.augment(text))# orprint(translate.augment(text))# 2. augmenting a doc (token based text)print(unigram.doc_augment(doc=corpus[0]))# doc_augment can also handle text:print(embed.doc_augment(text=text))# 3. augmenting a whole corpusprint(eda(corpus[0:200]))# 4. Active2Passive is still experimental:a2p.doc_augment(doc=['someone','is','not','reading','the','email'])

我们希望在语料库上应用一种方法，训练一个模型，并测量其性能

# tidy up RAMdelunigram,embed,translatecorpus_augmented=eda(corpus[0:200])# augment a small subset# save on disk:#save_json(corpus_augmented, "augmented_sick.json")# To use NLP models which matching to or benchmark tool set, it must implement the 'BaseModelWrapper' interface.# We set up a class who implements the fasttext nlp model from the gensim package.# This is only needed to get the benchmark runfromgensim.modelsimportFastTextclassFTModelStd500V5(BaseFTGensimModel):defdefine_epochs(self):return100defdefine_model(self):returnFastText(sg=1,cbow_mean=1,size=300,alpha=0.025,min_alpha=0.0001,min_n=1,max_n=5,window=5,min_count=1,sample=0.001,negative=5,workers=self.cpus-1)defdefine_training(self):self.model.build_vocab(sentences=self.data,update=False)self.model.train(sentences=self.data,total_examples=len(self.data),epochs=self.epochs)model=FTModelStd500V5()# train the model# model.train_on_corpus(corpus_augmented)# get a dumped model to store it on disk - or use it in another process# model.get()# dill_dumper(model.get(), "example_model.dill")# or to save a compressed model:# SeaQuBeCompressLoader.save_model_compressed(model.get(), "example_model_compressed.dill")nlp=SeaQuBeCompressLoader.load_compressed_model(join(dirname(__file__),"..","examples","example_model_compressed.dill"),"example")delmodel

使用基准工具进行语义质量分析

fromseaqube.benchmark.corpus4irimportCorpus4IRBenchmarkfromseaqube.benchmark.wordanalogyimportWordAnalogyBenchmarkfromseaqube.benchmark.wordsimilarityimportWordSimilarityBenchmarkwsb=WordSimilarityBenchmark(test_set='simlex999')print(wsb(nlp.model))# score=0.008905456556563954wab=WordAnalogyBenchmark('google-analogies')print(wab(nlp.model))# score=0.0c4ir=Corpus4IRBenchmark(corpus[0:200])# need the original corpus for setting up IRprint(c4ir(nlp.model))

安装开发环境

工具

 npm install generate-changelog -g 
 # see: https://www.npmjs.com/package/generate-changelog

欢迎加入QQ群-->： 979659372

seaqube 0.0.13b0

seaqube的Python项目详细描述

海库贝

简介

安装

待办事项：nlp.model.wv.dict['index2word']=nlp.model.vocabs（）

在nlp.model.wv.dict['vectors']=nlp.model.matrix（）

文字嵌入质量

使用NLP Loader生成的模型：

基本示例

导入所有增强方法

准备语料库和样本数据

设置所有增强：

一个（实验性的）主动到被动语音转换器。只有一句话/doc对另一句话。

简单的数据扩充方法实现（随机字交换、插入、删除和同义词替换）。

将文本翻译成其他语言并返回（使用谷歌翻译程序）。

使用另一个单词嵌入将单词替换为相似的单词。

在UDA算法的基础上，只采用Unigram方法，用其他低义词代替低义词。这种方法需要一个语料库，因为它需要检测低意义的完整单词

API-用法

每个增强对象都有相同的可能性

使用基准工具进行语义质量分析

安装开发环境

工具

推荐PyPI第三方库

Fram

ftrack-s3-accessor

configdict

dataq-di-2008

scapy-unroot

kanjidb

nvidia-dali-tf-plugin-cuda92

mendelai-brat-parser

sgm-data

mpscreen

odoo11-addon-web-view-calendar-column

prana-rc

hanser-module-upload

Emir-Liu-packet

datecharts

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签