openNMT translate命令产生垃圾结果

1条回答

网友

1楼 · 发布于 2024-10-02 02:32:02

即使是“已看到”的数据，您也无法获得像样的翻译，因为：

你的模型训练的句子对太少（3000对训练一个好的模型来说实在太少了）。使用4M+的语料库，你只能得到一些或多或少有意义的翻译（越多越好）
onmt_train -data EMMT/demo -save_model demo-model训练一个小的（2层x 500个神经元）单向RNN模型（见documentation）。建议使用transformer模型类型以获得最先进的结果

常见问题解答介绍了如何运行变压器模型培训：

The transformer model is very sensitive to hyperparameters. To run it effectively you need to set a bunch of different options that mimic the Google setup. We have confirmed the following command can replicate their WMT results.

python  train.py -data /tmp/de2/data -save_model /tmp/extra \
        -layers 6 -rnn_size 512 -word_vec_size 512 -transformer_ff 2048 -heads 8  \
        -encoder_type transformer -decoder_type transformer -position_encoding \
        -train_steps 200000  -max_generator_batches 2 -dropout 0.1 \
        -batch_size 4096 -batch_type tokens -normalization tokens  -accum_count 2 \
        -optim adam -adam_beta2 0.998 -decay_method noam -warmup_steps 8000 -learning_rate 2 \
        -max_grad_norm 0 -param_init 0  -param_init_glorot \
        -label_smoothing 0.1 -valid_steps 10000 -save_checkpoint_steps 10000 \
        -world_size 4 -gpu_ranks 0 1 2 3

Here are what each of the parameters <mean:
param_init_glorot-param_init 0: correct initialization of parameters
position_encoding: add sinusoidal position encoding to each embedding
optim adam, decay_method noam, warmup_steps 8000: use special learning rate.
batch_type tokens, normalization tokens, accum_count 4: batch and normalize based on number of tokens and not sentences. Compute gradients based on four batches.
label_smoothing 0.1: use label smoothing loss.

相关问题更多 >

编程相关推荐

热门问题

热门文章

openNMT translate命令产生垃圾结果

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >