Python toeicbert包_程序模块 - PyPI

用pytorch预训练bert模型求解toeic空白问题。

toeicbert的Python项目详细描述

托业伯特

76%的正确率只有预先训练的伯特模型在托业！！

这是一个主题为TOEIC(Test of English for International Communication) problem solving using pytorch-pretrained-BERT model.的项目，我之所以使用huggingface的pytorch-pretrained-BERT model是为了进行预训练或更容易进行微调。我解决了唯一的空白问题，而不是整个问题。有两种类型的空白问题：

选择正确的语法类型。

Q) The teacher had me _________ scales several times a day.
  1. play (Answer)
  2. to play
  3. played
  4. playing

选择正确的词汇类型。

Q) The wet weather _________ her from going shopping.
  1. interrupted
  2. obstructed
  3. impeded
  4. discouraged (Answer)

为什么是伯特？

在pretrained bert中，它包含上下文信息。所以它可以找到更多的上下文或语法句子，不清楚，一点点。我的灵感来自blog post的语法检查器。

Can We Use BERT as a Language Model to Assign a Score to a Sentence?
BERT uses a bidirectional encoder to encapsulate a sentence from left to right and from right to left. Thus, it learns two representations of each word-one from left to right and one from right to left-and then concatenates them for many downstream tasks.

评估

我只评估了{{ STR 1 } $预训练伯特模型（非微调）< /强>以检查语法或词汇错误。在数学表达式之上，X是一个疑问句。而n是问题的数目：{a, b, c, d}。C子集表示应答候选令牌：warranty的C是['warrant', '##y']。V表示总词汇。

不止一个令牌有问题。我通过得到每个张量的平均值来解决这个问题。例如）is being formed作为['is', 'being', 'formed']

然后，我们在L_n(T_n)中找到argmax。

predictions=model(question_tensors,segment_tensors)# predictions : [batch_size, sequence_length, vocab_size]predictions_candidates=predictions[0,masked_index,candidate_ids].mean()

评估结果。

仅使用预训练的bert模型的出色结果

bert-base-uncased：12层，768隐藏，12头，110m参数
bert-large-uncased：24层，1024隐藏，16头，340m参数
bert-base-cased：12层，768隐藏，12头，110m参数
bert-large-cased：24层，1024隐藏，16头，340m参数

总共7067个数据集：使用model.eval()

使不确定

	bert-base-uncased	bert-base-cased	bert-large-uncased	bert-large-cased
Correct Num	5192	5398	5321	5148
Percent	73.46%	76.38%	75.29%	72.84

使用python pip包快速入门。

以pip开头

$ pip install toeicbert

run&option

$ python toeicbert -m bert-base-uncased -f test.json

-m, --model：huggingface的pytorch预训练bert中的bert模型名：bert-base-uncased，bert-large-uncased，bert-base-cased，bert-large-cased。
-f, --file：要评估的json文件，请参见json格式，test.json。
键（问题1、2、3、4）是必需选项，但回答不是。
^有问题的{}将被替换为[MASK]

{"1":{"question":"The teacher had me _ scales several times a day.","answer":"play","1":"play","2":"to play","3":"played","4":"playing"},"2":{"question":"The teacher had me _ scales several times a day.","1":"play","2":"to play","3":"played","4":"playing"}}

作者

郑泰桓（Jeff Jung）@Graykode，京熙大学（本科）。
作者电子邮件：nlkey2022@gmail.com

感谢Hwan Suk Gang（京熙大学）收集数据集（7114数据集）

欢迎加入QQ群-->： 979659372

toeicbert 0.0.2

toeicbert的Python项目详细描述

托业伯特

76%的正确率只有预先训练的伯特模型在托业！！

评估

评估结果。

使用python pip包快速入门。

作者

推荐PyPI第三方库

cotton-tools

proptools

polyssifier

gaiasdk

dsfsgdf

joseph-http-test

django-knowledge-mega

hummingsim

metagenomics-focus

python-service

patt

celestial

ristretto

pychristmas

pybc-1

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

toeicbert 0.0.2

toeicbert的Python项目详细描述

托业伯特

76%的正确率只有预先训练的伯特模型在托业！！

评估

评估结果。

使用python pip包快速入门。

作者

推荐PyPI第三方库

cotton-tools

proptools

polyssifier

gaiasdk

dsfsgdf

joseph-http-test

django-knowledge-mega

hummingsim

metagenomics-focus

python-service

patt

celestial

ristretto

pychristmas

pybc-1

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签