bert语句编码工具
bert-sent-encoding的Python项目详细描述
这是一个伯特句子编码工具。
安装
pip install --index-url https://pypi.python.org/simple/ bert-sent-encoding==0.2.0
或
git clone ssh://git@gitlab.leihuo.netease.com:32200/shaojianzhi/bert-sent-encoding.git
cd bert-sent-encoding
python setup.py install
使用
from bert_sent_encoding import bert_sent_encoding # 1st line
bse = bert_sent_encoding(model_path='bert_sent_encoding/model/chinese_L-12_H-768_A-12', seq_length=64, batch_size=8) # 2nd line
vector = bse.get_vector('你吃饭了吗', word_vector=False, layer=-1) # 3rd line 1. get vector of string
vectors = bse.get_vector(['你吃饭了吗', '已经吃了呀'], word_vector=False, layer=-1) # 4th line 2. get vector list of strings
bse.write_txt2vector(input_file, output_file, word_vector=False, layer=-1) # 5th line 3. get and write vectors of strings
二线:
bse = bert_sent_encoding(model_path='bert_sent_encoding/model/chinese_L-12_H-768_A-12', seq_length=64, batch_size=8)
*model_path is required, seq_length and batch_size are optional
对于3号线、4号线和5号线
vector = bse.get_vector('你吃饭了吗', word_vector=False, layer=-1) # 3rd line 1. get vector of string
vectors = bse.get_vector(['你吃饭了吗', '已经吃了呀'], word_vector=False, layer=-1) # 4th line 2. get vector list of strings
bse.write_txt2vector(input_file, output_file, word_vector=False, layer=-1) # 5th line 3. get and write vectors of strings
*word_vector and layer are optional*
对于第5行:
bse.write_txt2vector(input_file, output_file) # 3. get and write vectors of strings
输入文件和输出文件的路径由用户定义,下面是输入文件的内容:
the first line text
the second line text
...