如何生成最有可能占据给定句子中缺失标记位置的标记列表？

2条回答

网友

1楼 · 编辑于 2024-06-25 22:55:52

我刚刚在model hub of HuggingFace上用BERT base uncased模型试用了您的示例，它生成了一个可能的令牌列表：

我可以写一个Colab笔记本来解释如何编写代码。每个神经网络总是输出一个概率分布，因此您可以以最高的概率返回令牌

网友

2楼 · 编辑于 2024-06-25 22:55:52

基本上，您可以执行与this answer中相同的操作，但不只是添加最佳拟合标记，而是以五个最拟合标记为例：

def fill_the_gaps(text):
    text = '[CLS] ' + text + ' [SEP]'
    tokenized_text = tokenizer.tokenize(text)
    indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text)
    segments_ids = [0] * len(tokenized_text)
    tokens_tensor = torch.tensor([indexed_tokens])
    segments_tensors = torch.tensor([segments_ids])
    with torch.no_grad():
        predictions = model(tokens_tensor, segments_tensors)
    results = []
    for i, t in enumerate(tokenized_text):
        if t == '[MASK]':
            #instead of argmax, we use argsort to sort the tokens which best fit
            predicted_index = torch.argsort(predictions[0, i], descending=True)
            tokens = []
            #the the 5 best fitting tokens and add the to the list
            for k in range(5):
                 predicted_token = tokenizer.convert_ids_to_tokens([predicted_index[k].item()])[0]
                tokens.append(predicted_token)
            results.append(tokens)
    return results

对于您的句子，这将导致：[['footballer', 'golfer', 'football', 'cyclist', 'boxer']]

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何生成最有可能占据给定句子中缺失标记位置的标记列表？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >