如何将预训练的快速文本向量转换为gensim mod

2024-06-30 11:42:45 发布

您现在位置:Python中文网/ 问答频道 /正文

如何将预训练的快速文本向量转换为gensim模型? 我需要预测输出字的方法。在

导入gensim 从gensim.模型导入Word2Vec 从gensim.models.wrappers公司导入快速文本

模型_wiki=gensim.models.KeyedVectors.load_word2vec_格式(“维基.ru.vec") model3=Word2Vec(句子=model_wiki)

TypeError Traceback (most recent call last) in ----> 1 model3 = Word2Vec(sentences=model_wiki) # train a model from the corpus

~/anaconda3/envs/pym/lib/python3.6/site-packages/gensim/models/word2vec.py in init(self, sentences, corpus_file, size, alpha, window, min_count, max_vocab_size, sample, seed, workers, min_alpha, sg, hs, negative, ns_exponent, cbow_mean, hashfxn, iter, null_word, trim_rule, sorted_vocab, batch_words, compute_loss, callbacks, max_final_vocab) 765 callbacks=callbacks, batch_words=batch_words, trim_rule=trim_rule, sg=sg, alpha=alpha, window=window, 766 seed=seed, hs=hs, negative=negative, cbow_mean=cbow_mean, min_alpha=min_alpha, compute_loss=compute_loss, --> 767 fast_version=FAST_VERSION) 768 769 def _do_train_epoch(self, corpus_file, thread_id, offset, cython_vocab, thread_private_mem, cur_epoch,

~/anaconda3/envs/pym/lib/python3.6/site-packages/gensim/models/base_any2vec.py in init(self, sentences, corpus_file, workers, vector_size, epochs, callbacks, batch_words, trim_rule, sg, alpha, window, seed, hs, negative, ns_exponent, cbow_mean, min_alpha, compute_loss, fast_version, **kwargs) 757 raise TypeError("You can't pass a generator as the sentences argument. Try an iterator.") 758 --> 759 self.build_vocab(sentences=sentences, corpus_file=corpus_file, trim_rule=trim_rule) 760 self.train( 761 sentences=sentences, corpus_file=corpus_file, total_examples=self.corpus_count,

~/anaconda3/envs/pym/lib/python3.6/site-packages/gensim/models/base_any2vec.py in build_vocab(self, sentences, corpus_file, update, progress_per, keep_raw_vocab, trim_rule, **kwargs) 934 """ 935 total_words, corpus_count = self.vocabulary.scan_vocab( --> 936 sentences=sentences, corpus_file=corpus_file, progress_per=progress_per, trim_rule=trim_rule) 937 self.corpus_count = corpus_count 938 self.corpus_total_words = total_words

~/anaconda3/envs/pym/lib/python3.6/site-packages/gensim/models/word2vec.py in scan_vocab(self, sentences, corpus_file, progress_per, workers, trim_rule) 1569 sentences = LineSentence(corpus_file)
1570 -> 1571 total_words, corpus_count = self._scan_vocab(sentences, progress_per, trim_rule) 1572 1573 logger.info(

~/anaconda3/envs/pym/lib/python3.6/site-packages/gensim/models/word2vec.py in _scan_vocab(self, sentences, progress_per, trim_rule) 1538
vocab = defaultdict(int) 1539 checked_string_types = 0 -> 1540 for sentence_no, sentence in enumerate(sentences): 1541 if not checked_string_types: 1542
if isinstance(sentence, string_types):

~/anaconda3/envs/pym/lib/python3.6/site-packages/gensim/models/keyedvectors.py in getitem(self, entities) 337 return self.get_vector(entities) 338 --> 339 return vstack([self.get_vector(entity) for entity in entities]) 340 341 def contains(self, entity):

TypeError: 'int' object is not iterable


Tags: inselfalphamodelssentencescorpusrulefile
1条回答
网友
1楼 · 发布于 2024-06-30 11:42:45

根据Gensim docs,您可以使用^{}函数:

Load the input-hidden weight matrix from Facebook’s native fasttext .bin and .vec output files

下面是一个例子:

from gensim.models.wrappers import FastText

model = FastText.load_fasttext_format('wiki.vec')

相关问题 更多 >