擅长:python、mysql、java
<p>我知道这个问题已经得到了回答,但我要提出一个更简单的解决办法。此解决方案将把google新闻向量加载到一个空白的spacy nlp对象中。</p>
<pre><code>import gensim
import spacy
# Path to google news vectors
google_news_path = "path\to\google\news\\GoogleNews-vectors-negative300.bin.gz"
# Load google news vecs in gensim
model = gensim.models.KeyedVectors.load_word2vec_format(gn_path, binary=True)
# Init blank english spacy nlp object
nlp = spacy.blank('en')
# Loop through range of all indexes, get words associated with each index.
# The words in the keys list will correspond to the order of the google embed matrix
keys = []
for idx in range(3000000):
keys.append(model.index2word[idx])
# Set the vectors for our nlp object to the google news vectors
nlp.vocab.vectors = spacy.vocab.Vectors(data=model.syn0, keys=keys)
>>> nlp.vocab.vectors.shape
(3000000, 300)
</code></pre>