Python vectorhub包_程序模块 - PyPI

一个使用tensorflow、pytorch和其他开放源代码库将数据编码为具有最先进模型的向量的线性程序。Word2Vec、Image2Vec、BERT等

vectorhub的Python项目详细描述

在

Vector Hub是一个用于发布、发现和使用最新模型的库，以将数据转换为向量。（text2vec、image2vec、video2vec、graph2vec、bert、inception等）

有很多方法可以从数据中提取向量。这个库旨在以一种简单的方式引入所有最先进的模型，以便轻松地将数据矢量化。在

Vector Hub提供：

从业者的低门槛（使用普通方法）
矢量化丰富复杂的数据类型，如：文本、图像、音频等，用3行代码
检索并查找有关模型的信息
一种轻松处理不同模型依赖关系的简单方法

快速启动：

New to Vectors

Google Colab Quickstart

Documentation

Full list of models

为什么选择Vector Hub？在

在不同的用例/域中有数千个u2vec模型。我们想创建一个中心，让人们能够聚合他们的工作并与社区共享。在

想想NLP的变形金刚，数据科学家的Sci工具包学习。在

安装：

要快速开始安装vectorhub：

pip install vectorhub

之后，我们的内置依赖关系管理器将告诉您在实例化时要安装什么一个模特。安装选项的主要类型可以在这里找到：https://hub.getvectorai.com/

要安装不同类型的型号：

^{pr2}$

要一次安装所有型号：

pip install vectorhub[all]

我们建议激活一个新的虚拟环境，然后使用以下方法进行安装：

python3 -m pip install virtualenv 
python3 -m virtualenv env 
source env/bin/activate
python3 -m pip install --upgrade pip 
python3 -m pip install vectorhub[all]

实例化我们的auto_编码器类，并使用任何模型！

from vectorhub.auto_encoder import AutoEncoder
encoder = AutoEncoder.from_model('text/bert')
encoder.encode("Hello vectorhub!")
[0.47, 0.83, 0.148, ...]

您可以从我们的型号列表中选择：

['text/albert', 'text/bert', 'text/labse', 'text/use', 'text/use-multi', 'text/use-lite', 'text/legal-bert', 'audio/fairseq', 'audio/speech-embedding', 'audio/trill', 'audio/trill-distilled', 'audio/vggish', 'audio/yamnet', 'audio/wav2vec', 'image/bit', 'image/bit-medium', 'image/inception', 'image/inception-v2', 'image/inception-v3', 'image/inception-resnet', 'image/mobilenet', 'image/mobilenet-v2', 'image/resnet', 'image/resnet-v2', 'text_text/use-multi-qa', 'text_text/use-qa', 'text_text/dpr', 'text_text/lareqa-qa']

利用Google Tensorflow Hub's强大的模型来创建向量

{a13行的图像使用^ 3行代码：

from vectorhub.encoders.image.tfhub import BitSmall2Vec
image_encoder = BitSmall2Vec()
image_encoder.encode('https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_92x30dp.png')
[0.47, 0.83, 0.148, ...]

使用Google的BERT模型将文本矢量化为3行代码：

from vectorhub.encoders.text.tfhub import Bert2Vec
text_encoder = Bert2Vec()
text_encoder.encode('This is sparta!')
[0.47, 0.83, 0.148, ...]

使用Google的USE QA模型，用3行代码将您的问题和答案矢量化：

from vectorhub.bi_encoders.text.tfhub import UseQA2Vec
text_encoder = UseQA2Vec()
text_encoder.encode_question('Who is sparta!')
[0.47, 0.83, 0.148, ...]
text_encoder.encode_answer('Sparta!')
[0.47, 0.83, 0.148, ...]

杠杆作用HuggingFace Transformer's Albert

from vectorhub.encoders.text import Transformer2Vec
text_encoder = Transformer2Vec('albert-base-v2')
text_encoder.encode('This is sparta!')
[0.47, 0.83, 0.148, ...]

利用Facebook的Dense Passage Retrieval

from vectorhub.bi_encoders.text_text.torch_transformers import DPR2Vec
text_encoder = DPR2Vec()
text_encoder.encode_question('Who is sparta!')
[0.47, 0.83, 0.148, ...]
text_encoder.encode_answer('Sparta!')
[0.47, 0.83, 0.148, ...]

使用您的模型轻松访问信息！

# If you want to additional information about the model, you can access the information below:
text_encoder.definition.repo
text_encoder.definition.description
# If you want all the information in a dictionary, you can call:
text_encoder.definition.create_dict() # returns a dictionary with model id, description, paper, etc.

使用矢量AI旁边的文档轻松上载矢量

from vectorhub.encoders.text import Transformer2Vec
encoder = Transformer2Vec('bert-base-uncased')

from vectorai import ViClient
vi_client = ViClient(username, api_key)
docs = vi_client.create_sample_documents(10)
vi_client.insert_documents('collection_name_here', docs, models={'color': encoder.encode})

# Now we can search through our collection 
vi_client.search('collection_name_here', field='color_vector_', vector=encoder.encode('purple'))

向量是什么？在

使用向量操作时的常用术语：

向量（又名。嵌入、编码、神经表示）~它是一个数字列表，用来表示一段数据。 E、 g.使用Word2Vec模型的单词“king”的向量是[0.47，0.83，0.148，…]
____2Vec（又名。模型、编码器、嵌入器）~将数据转换为向量，例如Word2Vec将单词转换为向量

如何使用向量？

向量有着广泛的应用。最常见的用例是使用向量分析执行语义向量搜索和分析主题/集群。在

如果您对这些应用程序感兴趣，请查看Vector AI。在

如何获得向量？

从深度学习模型中获取各层的输出
数据清理，例如一个热编码标签
将图形表示转换为向量

如何上传2Vec模型

Read here if you would like to contribute your model!

哲学

VectorHub的目标是提供一个灵活而全面的框架，使人们能够轻松地将他们的数据转换成任何形式的向量。虽然我们的重点主要集中在简单性上，但是定制应该始终是一种选择，只要理由是合理的，那么抽象级别始终是模型上传者。例如，对于文本，我们选择将编码保持在文本级别，而不是标记级别，因为文本的选择不应应用于令牌级别，因此实践中oner知道哪些文本进入了实际的向量（也就是说，我们没有忽略'[next][SEP][wo][##rd]'，而是选择显式地忽略“next word”。我们认为这将使从业者更好地关注编码时应该关注的问题。在

类似地，当我们将数据转换为向量时，我们将转换为本机Python对象。这样做的决定是，一旦创建了向量，就尝试删除尽可能多的依赖关系，特别是那些深度学习框架，如Tensorflow/Pythorch。这是为了允许在它的基础上构建其他框架。在

信用：

如果没有以下库和令人难以置信的机器学习社区发布了他们最先进的模型，这个库就不会存在：

https://github.com/huggingface/transformers
https://github.com/tensorflow/hub
https://github.com/pytorch/pytorch
Word2Vec图片-Alammar，Jay（2018年）。插图变压器[博客文章]。检索自https://jalammar.github.io/illustrated-transformer/
https://github.com/UKPLab/sentence-transformers

欢迎加入QQ群-->： 979659372

vectorhub 1.0.7

vectorhub的Python项目详细描述

Vector Hub是一个用于发布、发现和使用最新模型的库，以将数据转换为向量。（text2vec、image2vec、video2vec、graph2vec、bert、inception等）

快速启动：

为什么选择Vector Hub？在

安装：

实例化我们的auto_编码器类，并使用任何模型！

利用Google Tensorflow Hub's强大的模型来创建向量

杠杆作用HuggingFace Transformer's Albert

利用Facebook的Dense Passage Retrieval

使用您的模型轻松访问信息！

使用矢量AI旁边的文档轻松上载矢量

向量是什么？在

如何使用向量？

如何获得向量？

如何上传2Vec模型

哲学

信用：

推荐PyPI第三方库

refgene_parser

lessweb

anthemav

crdown

wechat-web-auth

tkml

picture-lake-weibo

segmentation-models-pytorch

haveibeenpwned

eofs

galaxygetopt

ib-dl

coind

django-metamodel

oidc-validators

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

vectorhub 1.0.7

vectorhub的Python项目详细描述

Vector Hub是一个用于发布、发现和使用最新模型的库，以将数据转换为向量。（text2vec、image2vec、video2vec、graph2vec、bert、inception等）

快速启动：

为什么选择Vector Hub？在

安装：

实例化我们的auto_编码器类，并使用任何模型！

利用Google Tensorflow Hub's强大的模型来创建向量

杠杆作用HuggingFace Transformer's Albert

利用Facebook的Dense Passage Retrieval

使用您的模型轻松访问信息！

使用矢量AI旁边的文档轻松上载矢量

向量是什么？在

如何使用向量？

如何获得向量？

如何上传2Vec模型

哲学

信用：

推荐PyPI第三方库

refgene_parser

lessweb

anthemav

crdown

wechat-web-auth

tkml

picture-lake-weibo

segmentation-models-pytorch

haveibeenpwned

eofs

galaxygetopt

ib-dl

coind

django-metamodel

oidc-validators

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签