Pytorch暹罗神经网络与BERT用于句子匹配

class SiameseNetwork(torch.nn.Module): def __init__(self): super(SiameseNetwork, self).__init__() self.brothers = torch.nn.Sequential( torch.nn.Linear(512 * 768, 512), torch.nn.BatchNorm1d(512), torch.nn.ReLU(inplace=True), torch.nn.Linear(512, 256), torch.nn.BatchNorm1d(256), torch.nn.ReLU(inplace=True), torch.nn.Linear(256, 32), ) self.final = torch.nn.Sequential( torch.nn.Linear(32, 16), torch.nn.ReLU(inplace=True), torch.nn.Linear(16, 2), ) def forward(self, left, right): outputLeft = self.brothers(left) outputRight = self.brothers(right) output = self.final((outputLeft - outputRight) ** 2) return output bros = SiameseNetwork() bros = bros.to(device)

for batch in tqdm(tLoader, desc=f"Train epoch: {epoch+1}"): a = batch[0].to(device) b = batch[1].to(device) y = torch.unsqueeze(batch[2].type(torch.FloatTensor), 1).to(device) optimizer.zero_grad() output = bros(a,b) loss = criterion(output, y) loss.backward() trainingLoss += loss.item() optimizer.step()

1条回答

网友

1楼 · 发布于 2024-09-27 18:04:04

您的第一层严重参数化过度，容易过度拟合（总共计算2.01亿个参数）。我假设形状512 * 768反映了标记的数量乘以它们的维度；如果是这样，您需要重新思考您的体系结构。您需要某种权重共享或共享策略，以将num_words * dim输入减少为固定表示（这正是循环网络取代完全连接的句子编码变体的原因）。特别是在基于转换器的体系结构中，[CLS]令牌（令牌编号0，输入前缀）通常用作序列级和双序列级任务的“摘要”令牌

相关问题更多 >

编程相关推荐

热门问题

热门文章