擅长:python、mysql、java
<p>我没有使用过GPT2,但是<a href="https://nlp.h-its.org/bpemb/" rel="nofollow noreferrer">bpemb</a>是一个很好的开始子词嵌入的地方。根据自述</p>
<blockquote>
<p>BPEmb is a collection of pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE) and trained on Wikipedia. Its intended use is as input for neural models in natural language processing.</p>
</blockquote>
<p>我在我的一个项目中使用了预训练的嵌入和<a href="https://github.com/google/sentencepiece" rel="nofollow noreferrer">sentencepiece</a>,结果证明它非常有用。在</p>