openNMT translate命令产生垃圾结果

2024-10-02 02:32:02 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在运行以下命令

onmt_translate  -model demo-model_step_100000.pt -src data/src-test.txt -output pred.txt -replace_unk -verbose

“pred.txt”文件中的结果与用于翻译的源语句完全不同

语料库大小为3000个平行句子。预处理命令为-

onmt_preprocess -train_src EMMT/01engParallel_onmt.txt -train_tgt EMMT/01maiParallel_onmt.txt -valid_src EMMT/01engValidation_onmt.txt -valid_tgt EMMT/01maiValidation_onmt.txt -save_data EMMT/demo

培训是在演示模型上进行的

onmt_train -data EMMT/demo -save_model demo-model

Tags: 命令srctxtdatamodeldemosavestep
1条回答
网友
1楼 · 发布于 2024-10-02 02:32:02

即使是“已看到”的数据,您也无法获得像样的翻译,因为:

  • 你的模型训练的句子对太少(3000对训练一个好的模型来说实在太少了)。使用4M+的语料库,你只能得到一些或多或少有意义的翻译(越多越好)
  • onmt_train -data EMMT/demo -save_model demo-model训练一个小的(2层x 500个神经元)单向RNN模型(见documentation)。建议使用transformer模型类型以获得最先进的结果

常见问题解答介绍了如何运行变压器模型培训:

The transformer model is very sensitive to hyperparameters. To run it effectively you need to set a bunch of different options that mimic the Google setup. We have confirmed the following command can replicate their WMT results.

python  train.py -data /tmp/de2/data -save_model /tmp/extra \
        -layers 6 -rnn_size 512 -word_vec_size 512 -transformer_ff 2048 -heads 8  \
        -encoder_type transformer -decoder_type transformer -position_encoding \
        -train_steps 200000  -max_generator_batches 2 -dropout 0.1 \
        -batch_size 4096 -batch_type tokens -normalization tokens  -accum_count 2 \
        -optim adam -adam_beta2 0.998 -decay_method noam -warmup_steps 8000 -learning_rate 2 \
        -max_grad_norm 0 -param_init 0  -param_init_glorot \
        -label_smoothing 0.1 -valid_steps 10000 -save_checkpoint_steps 10000 \
        -world_size 4 -gpu_ranks 0 1 2 3

Here are what each of the parameters <mean:

param_init_glorot-param_init 0: correct initialization of parameters

position_encoding: add sinusoidal position encoding to each embedding

optim adam, decay_method noam, warmup_steps 8000: use special learning rate.

batch_type tokens, normalization tokens, accum_count 4: batch and normalize based on number of tokens and not sentences. Compute gradients based on four batches.

label_smoothing 0.1: use label smoothing loss.

相关问题 更多 >

    热门问题