nn.Linear在形状上是如何工作的(批次大小、序列长度、隐藏大小)?

2024-09-27 19:24:31 发布

您现在位置:Python中文网/ 问答频道 /正文

self.classifier中,我假设应该对令牌应用不同的权重。下面是huggingface的实现

def __init__(self, config, num_labels=2):
    super(BertForTokenClassification, self).__init__(config)
    self.num_labels = num_labels
    self.bert = BertModel(config)
    self.dropout = nn.Dropout(config.hidden_dropout_prob)
    self.classifier = nn.Linear(config.hidden_size, num_labels)
    self.apply(self.init_bert_weights)

def forward(self, input_ids, token_type_ids=None, attention_mask=None, labels=None):
    sequence_output, _ = self.bert(input_ids, token_type_ids, attention_mask, output_all_encoded_layers=False)
    sequence_output = self.dropout(sequence_output)
    logits = self.classifier(sequence_output)

    if labels is not None:
        loss_fct = CrossEntropyLoss()
        # Only keep active parts of the loss
        if attention_mask is not None:
            active_loss = attention_mask.view(-1) == 1
            active_logits = logits.view(-1, self.num_labels)[active_loss]
            active_labels = labels.view(-1)[active_loss]
            loss = loss_fct(active_logits, active_labels)
        else:
            loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
        return loss
    else:
        return logits

Tags: selfnoneviewconfigidsoutputlabelsmask

热门问题