如何测试训练后的蒙面语言模型呢？

>>> from transformers import pipeline >>> unmasker = pipeline('fill-mask', model='bert-base-uncased') >>> unmasker("Hello I'm a [MASK] model.") [{'sequence': "[CLS] hello i'm a fashion model. [SEP]", 'score': 0.1073106899857521, 'token': 4827, 'token_str': 'fashion'}, {'sequence': "[CLS] hello i'm a role model. [SEP]", 'score': 0.08774490654468536, 'token': 2535, 'token_str': 'role'}, {'sequence': "[CLS] hello i'm a new model. [SEP]", 'score': 0.05338378623127937, 'token': 2047, 'token_str': 'new'}, {'sequence': "[CLS] hello i'm a super model. [SEP]", 'score': 0.04667217284440994, 'token': 3565, 'token_str': 'super'}, {'sequence': "[CLS] hello i'm a fine model. [SEP]", 'score': 0.027095865458250046, 'token': 2986, 'token_str': 'fine'}

1条回答

网友

1楼 · 发布于 2024-10-04 09:25:09

这在很大程度上取决于你的任务。您的任务似乎是蒙面语言建模，即预测一个或多个蒙面单词：

今天我吃了。

（比萨饼）或（意大利面）可能同样正确，因此不能使用accuray等度量。但是（水）应该比其他两个更不“正确”。因此，您通常要做的是检查语言模型在评估数据集上的“惊讶程度”。这个度量称为perplexity。因此，在对特定数据集上的模型进行微调之前和之后，您会计算复杂度，并且您希望在微调之后复杂度会更低。模型应该更多地用于您的特定词汇表等，这就是您测试模型的方式

如您所见，他们计算了您提到的教程中的困惑：

import math
eval_results = trainer.evaluate()
print(f"Perplexity: {math.exp(eval_results['eval_loss']):.2f}")

要预测样本，您需要标记这些样本并准备模型的输入。填充遮罩管道可以为您执行以下操作：

# if you trained your model on gpu you need to add this line:
trainer.model.to('cpu')

unmasker = pipeline('fill-mask', model=trainer.model, tokenizer=tokenizer)
unmasker("today I ate <mask>")

这将导致以下输出：

[{'score': 0.23618391156196594,
  'sequence': 'today I ate it.',
  'token': 24,
  'token_str': ' it'},
 {'score': 0.03940323367714882,
  'sequence': 'today I ate breakfast.',
  'token': 7080,
  'token_str': ' breakfast'},
 {'score': 0.033759087324142456,
  'sequence': 'today I ate lunch.',
  'token': 4592,
  'token_str': ' lunch'},
 {'score': 0.025962186977267265,
  'sequence': 'today I ate pizza.',
  'token': 9366,
  'token_str': ' pizza'},
 {'score': 0.01913984678685665,
  'sequence': 'today I ate them.',
  'token': 106,
  'token_str': ' them'}]

相关问题更多 >

编程相关推荐

热门问题

热门文章