Python bert-base包_程序模块 - PyPI

使用google的bert完成中文自然语言处理任务，比如命名实体识别，并提供服务器服务

bert-base的Python项目详细描述

伯特·比尔斯米特·克鲁纳
用带有google bert微调的bilstm-crf模型求解ner任务的tensorflow方法
中新网
中新网
欢迎来到星光宝库！
中文培训数据（$path/nerdata/）来自：https://github.com/zjy-ucas/chinesener
conll-2003数据（$path/ner data/ori/）来自：https://github.com/kyzhouhzau/bert-ner" rel="nofollow">https://github.com/kyzhouhzau/bert-ner
评估代码来自：https://github.com/guillaumegenthial/tf_metrics/blob/master/tf_metrics/u init_u.py" rel="nofollow">https://github.com/guillaumegenthial/tf_metrics/blob/master/tf_metrics/init_u.py
尝试实现基于谷歌的bert代码和bilstm crf网络的ner工作！这个项目可能更接近处理中国数据。但其他语言只需修改少量代码即可。
此项目仅支持python3。
###################################################################

下载项目并安装

您可以通过以下方式安装此项目：

pip install bert-base==0.0.8 -i https://pypi.python.org/simple

或

git clone https://github.com/macanv/BERT-BiLSTM-CRF-NER
cd BERT-BiLSTM-CRF-NER/
python3 setup.py install

更新：

2019.2.25修复NER服务的一些错误
2019.2.19:添加文本分类服务
修复丢失错误
在培训过程中添加label_list params，以便在培训过程中使用-label_list xxx到特殊标签。

列车型号：

您可以使用-help查看训练命名实体识别模型的相关参数，其中必须指定data_dir、bert_config_file、output_dir、init_checkpoint、vocab_file。

bert-base-ner-train -help

src=

train/dev/test数据集如下：

海 O
钓 O
比 O
赛 O
地 O
点 O
在 O
厦 B-LOC
门 I-LOC
与 O
金 B-LOC
门 I-LOC
之 O
间 O
的 O
海 O
域 O
。 O

每行中的第一行是令牌，第二行是令牌的标签，行被空行分割。每个句子的最大长度为[max_seq_length]参数。
您可以从以上两个git repo获得培训数据
您可以通过运行以下命令来训练NER模型：

bert-base-ner-train \
    -data_dir {your dataset dir}\
    -output_dir {training output dir}\
    -init_checkpoint {Google BERT model dir}\
    -bert_config_file {bert_config.json under the Google BERT model dir} \
    -vocab_file {vocab.txt under the Google BERT model dir}

您可以使用-label_list params使用特殊标签，项目从培训数据中获取标签。

# using , split
-labels 'B-LOC, I-LOC ...'
OR save label in a file like labels.txt, one line one label
-labels labels.txt

训练模型之后，ner模型将保存在{output_dir}中，您可以在命令行上方指定该目录。

作为服务

许多服务器和客户端代码来自优秀的开源项目：bert-as-service of hanxiao如果我的代码违反了任何许可协议，请让我知道，我将在第一时间更正它。 ~~和ner server/client服务代码可以通过简单的修改应用到其他任务，例如文本分类，我稍后将提供这些内容。~~ 此项目专用命名实体识别和文本分类服务器服务。如果您想在GitHub或"我的工作"上共享您的模型，欢迎提交您的请求或共享您的模型。

您可以使用-help查看作为服务的ner的相关参数：需要哪个model_dir，bert_model_dir

bert-base-serving-start -help

src=

您还可以使用下面的CMD START NER服务：

bert-base-serving-start \
    -model_dir C:\workspace\python\BERT_Base\output\ner2 \
    -bert_model_dir F:\chinese_L-12_H-768_A-12
    -model_pb_dir C:\workspace\python\BERT_Base\model_pb_dir
    -mode NER

或文本分类服务：

bert-base-serving-start \
    -model_dir C:\workspace\python\BERT_Base\output\ner2 \
    -bert_model_dir F:\chinese_L-12_H-768_A-12
    -model_pb_dir C:\workspace\python\BERT_Base\model_pb_dir
    -mode CLASS
    -max_seq_len 202

如您所见：
mode：如果mode为ner/class，则启动由命名实体识别/文本分类标识的服务。如果是bert，它将与[bert as service]项目相同。
bert_model_dir:bert_model_dir是一个bert模型，您可以从https://github.com/google research/bert ner_model_dir：您的ner model checkpoint dir model_pb_dir:model freeze save dir，运行optimize func后，将包含类似ner_model.pb的二进制文件

< Buff行情>

您可以从以下网址下载我的NER模型：https://pan.baidu.com/s/lm9vcueq5gf-tjc00sfd88w" rel="nofollow">https://pan.baidu.com/s/lm9vcueq5gf-tjc00sfd88w，示例代码：guqq 或者文本分类模型来自：https://pan.baidu.com/s/1opsouh1n5am2hjdio2xcw" rel="nofollow">https://pan.baidu.com/s/1opsouh1n5am2hjdio2xcw，例如代码：bbu8
将ner_mode.pb/classification_model.pb设置为model_pb_dir，将其他文件设置为model_dir（不同的模型需要单独存储，您可以将ner models label_list.pkl和label2id.pkl设置为model_dir/ner/并将文本分类文件设置为model_dir/text_classification），文本分类mODEL可以将中文数据分为12类："游戏"、"娱乐"、"财"、"时"、"股票"、"育"、"社"、"体育"、"家"、"时尚"、"房"、"彩票"

您可以看到以下服务启动信息： src= src=

您可以使用以下代码测试客户端：

＜H4＞1。新能源客户端

pip install bert-base==0.0.8 -i https://pypi.python.org/simple

运行以上代码后，您可以看到： src= 如果要自定义分词方法，只需对客户端代码进行以下简单更改。

pip install bert-base==0.0.8 -i https://pypi.python.org/simple

1 ＜H4＞2。文本分类客户端

pip install bert-base==0.0.8 -i https://pypi.python.org/simple

运行以上代码后，您可以看到： src=

注意，它不能同时启动ner服务和文本分类服务。但是，您可以使用两次命令行启动ner服务和不同端口的文本分类。

以下教程是旧版本，以后将删除。

如何训练

＜H4＞1。下载伯特中文版：

pip install bert-base==0.0.8 -i https://pypi.python.org/simple

3 ＜H4＞2。创建输出目录

在项目路径中创建输出路径：

pip install bert-base==0.0.8 -i https://pypi.python.org/simple

4 ＜H4＞3。列车型号

第一种方法

pip install bert-base==0.0.8 -i https://pypi.python.org/simple

或替换bert lstm ner.py中的bert路径和项目路径

pip install bert-base==0.0.8 -i https://pypi.python.org/simple

比运行：

pip install bert-base==0.0.8 -i https://pypi.python.org/simple

使用BLSTM-CRF或仅使用CRF进行解码！

只要改变bert_lstm_ner.py行450，add_blstm_crf_layer函数的参数：crf_only=true或false

仅CRF输出层：

pip install bert-base==0.0.8 -i https://pypi.python.org/simple

带CRF输出层的BILSTM

pip install bert-base==0.0.8 -i https://pypi.python.org/simple

结果：

所有参数使用默认值

在开发数据集中：

src=

在测试数据集中

src=

实体Leval结果：

最后两个结果是标签级结果，第796-798行代码中的实体级结果，该结果将在预测过程中输出。显示我的实体级结果： src=

< Buff行情>

我的模型可以从百度云下载：
链：https://pan.baidu.com/s/1gfdfflectv5393ufbydgqq" rel="nofollow">https://pan.baidu.com/s/1gfflectv5393ufbydgqq4cus
注：我的模型仅由crf_参数训练

< /块OT>

在线预测

如果模型已完成训练，只需运行

git clone https://github.com/macanv/BERT-BiLSTM-CRF-NER
cd BERT-BiLSTM-CRF-NER/
python3 setup.py install

src=

使用NER作为服务

服务

使用ner作为服务很简单，您只需要在项目根路径中运行下面的python脚本：

git clone https://github.com/macanv/BERT-BiLSTM-CRF-NER
cd BERT-BiLSTM-CRF-NER/
python3 setup.py install

您可以从以下网址下载我的NER模型：https://pan.baidu.com/s/lm9vcueq5gf-tjc00sfd88w" rel="nofollow">https://pan.baidu.com/s/lm9vcueq5gf-tjc00sfd88w，例如：guqq
将ner_mode.pb设置为model_pd_dir，并将其他文件设置为ner_model_dir，然后运行最后一个命令
src= src=

客户机

使用方法的客户端可以引用client_test.py脚本

git clone https://github.com/macanv/BERT-BiLSTM-CRF-NER
cd BERT-BiLSTM-CRF-NER/
python3 setup.py install

注意：输入格式，您有时可以将bert作为服务项目引用。
欢迎提供更多的Java语言代码，如Java等。

使用自己的数据进行培训

如果您想使用自己的数据来训练ner模型，只需修改get labes函数即可。

git clone https://github.com/macanv/BERT-BiLSTM-CRF-NER
cd BERT-BiLSTM-CRF-NER/
python3 setup.py install

注意："x"、"cls"、"sep"这三个是必需的，您只需将数据标签替换到此返回列表即可。
或者您可以使用最后一个代码让程序从培训数据中自动获取标签

git clone https://github.com/macanv/BERT-BiLSTM-CRF-NER
cd BERT-BiLSTM-CRF-NER/
python3 setup.py install

新更新

2019.1.30支持PIP安装和命令行控制

2019.1.30为新能源程序添加服务/客户

2019.1.9：添加代码以删除模型中与ADAM相关的参数，并将模型文件的大小从1.3GB减小到400MB。

2019.1.3：添加在线预测代码

参考：

评估代码来自：https://github.com/guillaumegenthial/tf_metrics/blob/master/tf_metrics/u init_u.py" rel="nofollow">https://github.com/guillaumegenthial/tf_metrics/blob/master/tf_metrics/init_u.py
https://github.com/google research/bert
https://github.com/kyzhouhzau/bert-ner
https://github.com/zjy-ucas/chinesener
https://github.com/hanxiao/bert-as-service

< Buff行情>

如有任何问题，请打开问题或发送电子邮件给我（ma_cancan@163.com）

欢迎加入QQ群-->： 979659372

bert-base 0.0.9

bert-base的Python项目详细描述

下载项目并安装

更新：

列车型号：

作为服务

以下教程是旧版本，以后将删除。

如何训练

第一种方法

或替换bert lstm ner.py中的bert路径和项目路径 pip install bert-base==0.0.8 -i https://pypi.python.org/simple 6 比运行：pip install bert-base==0.0.8 -i https://pypi.python.org/simple 7

使用BLSTM-CRF或仅使用CRF进行解码！

结果：

在开发数据集中：

在测试数据集中

实体Leval结果：

在线预测

使用NER作为服务

服务

客户机

使用自己的数据进行培训

新更新

参考：

推荐PyPI第三方库

autoclasstoc

IPregexo

phper

deltat

odoo13-addon-sale-fixed-discount

datasette-glitch

kb4api

duckdb-engine

texshade

reasoner-validator

qase-xctest

gensim-doc-zh

soroushpdf

ghost-blindsight

ptyx

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

或替换bert lstm ner.py中的bert路径和项目路径
`pip install bert-base==0.0.8 -i https://pypi.python.org/simple`
6
比运行：
`pip install bert-base==0.0.8 -i https://pypi.python.org/simple`
7

导航栏

项目链接

标签