我正在处理一批不同长度的句子,因此我计划利用gpt2中的padding+attention_mask功能
同时,对于每个句子,我需要添加一个后缀短语并运行N个不同的推理。例如,对于“我喜欢喝可乐”这句话,我可能需要运行两个不同的推论:“我喜欢喝可乐。可乐是好的”和“我喜欢喝可乐。饮料是好的”。因此,我试图通过使用“过去”功能来提高推理时间:https://huggingface.co/transformers/quickstart.html#using-the-past因此我只处理原始句子(例如,“我喜欢喝可乐”)一次,然后我不知何故扩展结果,以便能够与其他两个句子一起使用:“可乐很好”和“饮料很好”
下面,您将看到一个简单的代码,它试图表示我是如何尝试这样做的。为了简单起见,我只是在每个句子中添加一个后缀短语(…但我仍然希望我的原始想法是可能的):
from transformers.tokenization_gpt2 import GPT2Tokenizer
from transformers.modeling_gpt2 import GPT2LMHeadModel
tokenizer = GPT2Tokenizer.from_pretrained('gpt2', pad_token='<|endoftext|>')
model = GPT2LMHeadModel.from_pretrained('gpt2')
# Complete phrases are: "I like to drink soda without sugar" and "Go watch TV alone, I am not going"
docs = ["I like to drink soda", "Go watch TV"]
docs_tensors = tokenizer.batch_encode_plus(
[d for d in docs], pad_to_max_length=True, return_tensors='pt')
docs_next = ["without sugar", "alone, I am not going"]
docs_next_tensors = tokenizer.batch_encode_plus(
[d for d in docs_next], pad_to_max_length=True, return_tensors='pt')
# predicting the first part of each phrase
_, past = model(docs_tensors['input_ids'], attention_mask=docs_tensors['attention_mask'])
# predicting the rest of the phrase
logits, _ = model(docs_next_tensors['input_ids'], attention_mask=docs_next_tensors['attention_mask'], past=past)
logits = logits[:, -1]
_, top_indices_results = logits.topk(30)
我得到的错误如下:
Traceback (most recent call last):
File "/Applications/PyCharm CE.app/Contents/plugins/python-ce/helpers/pydev/pydevd.py", line 1434, in _exec
pydev_imports.execfile(file, globals, locals) # execute the script
File "/Applications/PyCharm CE.app/Contents/plugins/python-ce/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/Users/damiox/Workspace/xxLtd/yy/stress-test-withpast2.py", line 26, in <module>
logits, _ = model(docs_next_tensors['input_ids'], attention_mask=docs_next_tensors['attention_mask'], past=past)
File "/Users/damiox/.local/share/virtualenvs/yy-uMxmjV2h/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/Users/damiox/.local/share/virtualenvs/yy-uMxmjV2h/lib/python3.7/site-packages/transformers/modeling_gpt2.py", line 593, in forward
inputs_embeds=inputs_embeds,
File "/Users/damiox/.local/share/virtualenvs/yy-uMxmjV2h/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/Users/damiox/.local/share/virtualenvs/yy-uMxmjV2h/lib/python3.7/site-packages/transformers/modeling_gpt2.py", line 476, in forward
hidden_states, layer_past=layer_past, attention_mask=attention_mask, head_mask=head_mask[i]
File "/Users/damiox/.local/share/virtualenvs/yy-uMxmjV2h/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/Users/damiox/.local/share/virtualenvs/yy-uMxmjV2h/lib/python3.7/site-packages/transformers/modeling_gpt2.py", line 226, in forward
self.ln_1(x), layer_past=layer_past, attention_mask=attention_mask, head_mask=head_mask
File "/Users/damiox/.local/share/virtualenvs/yy-uMxmjV2h/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/Users/damiox/.local/share/virtualenvs/yy-uMxmjV2h/lib/python3.7/site-packages/transformers/modeling_gpt2.py", line 189, in forward
attn_outputs = self._attn(query, key, value, attention_mask, head_mask)
File "/Users/damiox/.local/share/virtualenvs/yy-uMxmjV2h/lib/python3.7/site-packages/transformers/modeling_gpt2.py", line 150, in _attn
w = w + attention_mask
RuntimeError: The size of tensor a (11) must match the size of tensor b (6) at non-singleton dimension 3
Process finished with exit code 1
起初我认为这与https://github.com/huggingface/transformers/issues/3031有关,所以我重新构建了最新的master来尝试修复,但我仍然遇到了这个问题
为了使当前代码段正常工作,您必须按如下方式组合上一个和新的注意掩码:
对于要测试一个句子开头的两个可能后缀的情况,您可能需要克隆过去变量的次数与克隆后缀的次数相同。这意味着前缀输入_id的批量大小必须与后缀输入_id的批量大小相匹配,才能使其正常工作
此外,如果您的前缀输入_id之一被填充,您必须更改后缀输入_id的位置编码输入(GPT2使用绝对位置编码)(上面的代码中没有显示这一点-请查看https://github.com/huggingface/transformers/issues/3021以了解它是如何完成的)
相关问题 更多 >
编程相关推荐