<p>下面是一个使用<code>ScheduledEmbeddingTrainingHelper</code>,使用tensorflow1.3和更高级别的基本示例tf.contrib公司原料药。这是一个sequence2sequence模型,其中解码器的初始隐藏状态是编码器的最终隐藏状态。它只展示了如何在单个批次上进行训练(显然,任务是“颠倒这个顺序”)。对于实际的训练任务,我建议tf.contrib.学习API,如learn\u runner、Experience和估算器. 在</p>
<pre><code>import tensorflow as tf
import numpy as np
from tensorflow.python.layers.core import Dense
vocab_size = 7
embedding_size = 5
lstm_units = 10
src_batch = np.array([[1, 2, 3], [4, 5, 6]])
trg_batch = np.array([[3, 2, 1], [6, 5, 4]])
# *_seq will have shape (2, 3), *_seq_len will have shape (2)
source_seq = tf.placeholder(shape=(None, None), dtype=tf.int32)
target_seq = tf.placeholder(shape=(None, None), dtype=tf.int32)
source_seq_len = tf.placeholder(shape=(None,), dtype=tf.int32)
target_seq_len = tf.placeholder(shape=(None,), dtype=tf.int32)
# add Start of Sequence (SOS) tokens to each sequence
batch_size, sequence_size = tf.unstack(tf.shape(target_seq))
sos_slice = tf.zeros([batch_size, 1], dtype=tf.int32) # 0 = start of sentence token
decoder_input = tf.concat([sos_slice, target_seq], axis=1)
embedding_matrix = tf.get_variable(
name="embedding_matrix",
shape=[vocab_size, embedding_size],
dtype=tf.float32)
source_seq_embedded = tf.nn.embedding_lookup(embedding_matrix, source_seq) # shape=(2, 3, 5)
decoder_input_embedded = tf.nn.embedding_lookup(embedding_matrix, decoder_input) # shape=(2, 4, 5)
unused_encoder_outputs, encoder_state = tf.nn.dynamic_rnn(
tf.contrib.rnn.LSTMCell(lstm_units),
source_seq_embedded,
sequence_length=source_seq_len,
dtype=tf.float32)
# Decoder:
# At each time step t and for each sequence in the batch, we get x_t by either
# (1) sampling from the distribution output_layer(t-1), or
# (2) reading from decoder_input_embedded.
# We do (1) with probability sampling_probability and (2) with 1 - sampling_probability.
# Using sampling_probability=0.0 is equivalent to using TrainingHelper (no sampling).
# Using sampling_probability=1.0 is equivalent to doing inference,
# where we don't supervise the decoder at all: output at t-1 is the input at t.
sampling_prob = tf.Variable(0.0, dtype=tf.float32)
helper = tf.contrib.seq2seq.ScheduledEmbeddingTrainingHelper(
decoder_input_embedded,
target_seq_len,
embedding_matrix,
sampling_probability=sampling_prob)
output_layer = Dense(vocab_size)
decoder = tf.contrib.seq2seq.BasicDecoder(
tf.contrib.rnn.LSTMCell(lstm_units),
helper,
encoder_state,
output_layer=output_layer)
outputs, state, seq_len = tf.contrib.seq2seq.dynamic_decode(decoder)
loss = tf.contrib.seq2seq.sequence_loss(
logits=outputs.rnn_output,
targets=target_seq,
weights=tf.ones(trg_batch.shape))
train_op = tf.contrib.layers.optimize_loss(
loss=loss,
global_step=tf.contrib.framework.get_global_step(),
optimizer=tf.train.AdamOptimizer,
learning_rate=0.001)
with tf.Session() as session:
session.run(tf.global_variables_initializer())
_, _loss = session.run([train_op, loss], {
source_seq: src_batch,
target_seq: trg_batch,
source_seq_len: [3, 3],
target_seq_len: [3, 3],
sampling_prob: 0.5
})
print("Loss: " + str(_loss))
</code></pre>
<p>对于<code>ScheduledOutputTrainingHelper</code>,我希望只交换助手并使用:</p>
^{pr2}$
<p>但是这会产生一个错误,因为LSTM单元要求每个时间步都有一个多维输入(形状为(批处理大小,输入尺寸))。我将在GitHub中提出一个问题,以确定这是否是一个bug,或者有其他方法可以使用<code>ScheduledOutputTrainingHelper</code>。在</p>