好吧,我试着用keras和tensorflow做一个内在的好奇代理。这个代理的奖励函数是自动编码器在前一个和当前状态之间的损失,以及自动编码器在当前状态和想象的下一个状态之间的损失之差。然而,这个奖励函数总是返回无,而不是实际的差异。我试过把损失打印出来,但它总是给出正确的值。你知道吗
奖励功能/重播代码:
def replay(self, batch):
minibatch = R.sample(self.memory, batch)
for prev_state, actions, state, reward, imagined_next_state in minibatch:
target = []
imagined_next_state = np.add(np.random.random(self.state_size), imagined_next_state)
target_m = self.model.predict(state)
for i in range(len(target_m)):
target_m[i][0][actions[i]]=reward
history_m = self.model.fit(state, target_m, epochs=1, verbose=0)
history_ae_ps = self.autoencoder.fit(prev_state, state, epochs=1, verbose=0)
history_ae_ns = self.autoencoder.fit(state, imagined_next_state, epochs=1, verbose=0)
loss_m = history_m.history['loss'][-1]
loss_ae_ps = history_ae_ps.history['loss'][-1]
loss_ae_ns = history_ae_ns.history['loss'][-1]
print("LOSS AE PS:", loss_ae_ps)
print("LOSS AE NS:", loss_ae_ns)
loss_ae = loss_ae_ns - loss_ae_ps
print(reward, loss_ae)
return loss_ae
代理环境循环代码:
def loop(self, times='inf'):
if times is 'inf':
times = 2**31
reward = 0.0001
prev_shot = self.get_shot()
for i in range(times):
acts, ins, act_probs, shot = self.get_act()
act_0 = acts[0]
act_1 = acts[1]
act_2 = acts[2]
act_3 = acts[3]
self.act_to_mouse(act_0, act_1)
self.act_to_click(act_2)
self.act_to_keys(act_3)
reward = self.remember_and_replay(prev_shot, acts, shot, reward, ins)
if reward is None:
raise(RewardError("Rewards are none."))
prev_shot = shot
我只是边打问题边解决。我只是没有在记忆和重放方法中返回奖励。。。你知道吗
remember\u和\u replay方法如下所示:
当它应该是这样的时候:
希望我能帮助别人。:)
相关问题 更多 >
编程相关推荐