实值序列到序列自动编码器
seq2seq的Python项目详细描述
实值序列到序列自动编码器
我能找到的大多数序列到序列的自动编码器都适用于分类序列, 比如翻译。
此自动编码器用于实值序列。
输入和输出可以是多维的,具有不同的维度,甚至可以是完全不同的
兼容性
- 此软件包与Python2和3都兼容
main.py
和main.ipynb
中的示例脚本也与这两者兼容。- 我没有使用TensorFlow 2.0的计划,所以所有关于未来过时的胡言乱语都被屏蔽了
安装
pip install seq2seq
使用量
首先创建工厂:
frommodelsimportNDS2SAEFactoryn_iterations=3000factory=NDS2SAEFactory()factory.set_output('toy.zip')factory.input_dim=2# Input is 2 dimensiona;factory.output_dim=1# Output is one dimensionalfactory.layer_sizes=[50,30]# Use exponential decaying learning rate, in here it starts at 0.02 then decreases exponentially to 0.00001 factory.lrtype='expdecay'factory.lrargs=dict(start_lr=0.02,finish_lr=0.00001,decay_steps=n_iterations)# Alternatively, set a constant rate is also possible, e.g. 0.001# factory.lrtype = 'constant'# factory.lrargs = dict(lr=0.001)# For the dropout layer. Default is None. If None, the dropout layer is not usedfactory.keep_prob=0.7# The hidden layer will be symmetric (in this case: 50:30:30:50)# otherwise it'll be repeated (50:30:50:30)factory.symmetric=True# Save or load (and resume) from this zip fileencoder=factory.build()
创建训练样本生成器和验证样本生成器。两者都应该有相同的签名:
defgenerate_samples(batch_size):""" :return in_seq: a list of input sequences. Each sequence must be a np.ndarray out_seq: a list of output sequences. Each sequence must be a np.ndarray These sequences don't need to be the same length and don't need any padding The encoder will take care of that last_batch: True if this batch is the last of the iteration. E.g. if you have 70000 samples and the batch size is 20000, you'll have 4 batches the last batch contains 10000 samples. You should return False for the first 3 batches and True for the last one """...returnin_seq,out_seq,last_batch
列车
encoder.train(train_generator,valid_generator,n_iterations=3000,batch_size=100,display_step=100)
预测
# test_seq is a list of np.ndarrayspredicted=encoder.predict(test_seq)# predicted is a list of np.ndarrays. Each sequence will have the same length (due to padding)# Look for the stop token to truncate the padding out
编码
# test_seq is a list of np.ndarraysencoded=encoder.encode(test_seq)# encoded is a list of hidden-layer states corresponding to each input sequence
Jupyter笔记本
打开main.ipynb运行示例
许可证
麻省理工学院
欢迎使用个人资料