Tensorflow 2.0 DQN代理问题与自定义环境

2024-07-02 04:44:38 发布

男 | 程序猿一只，喜欢编程写python代码。

所以我一直在遵循DQN代理示例/教程，并像示例中那样进行设置，唯一的区别是我构建了自己的自定义python环境，然后将其包装在TensorFlow中。然而，无论我如何塑造我的观察和行动规范，每当我给它一个观察和要求一个行动时，我似乎都无法让它发挥作用。下面是我得到的错误：

tensorflow.python.framework.errors_impl.InvalidArgumentError: In[0] is not a matrix. Instead it has shape [10] [Op:MatMul]

我是这样安排我的经纪人的：

layer_parameters = (10,) #10 layers deep, shape is unspecified

#placeholders 
learning_rate = 1e-3  # @param {type:"number"}
train_step_counter = tf.Variable(0)

#instantiate agent

optimizer = tf.compat.v1.train.AdamOptimizer(learning_rate=learning_rate)

env = SumoEnvironment(self._num_actions,self._num_states)
env2 = tf_py_environment.TFPyEnvironment(env)
q_net= q_network.QNetwork(env2.observation_spec(),env2.action_spec(),fc_layer_params = layer_parameters)

print("Time step spec")
print(env2.time_step_spec())

agent = dqn_agent.DqnAgent(env2.time_step_spec(),
env2.action_spec(),
q_network=q_net,
optimizer = optimizer,
td_errors_loss_fn=common.element_wise_squared_loss,
train_step_counter=train_step_counter)

下面是我如何设置我的环境：

class SumoEnvironment(py_environment.PyEnvironment):

def __init__(self, no_of_Actions, no_of_Observations):

    #this means that the observation consists of a number of arrays equal to self._num_states, with datatype float32
    self._observation_spec = specs.TensorSpec(shape=(16,),dtype=np.float32,name='observation')
    #action spec, shape unknown, min is 0, max is the number of actions
    self._action_spec = specs.BoundedArraySpec(shape=(1,),dtype=np.int32,minimum=0,maximum=no_of_Actions-1,name='action')


    self._state = 0
    self._episode_ended = False

以下是我的输入/观察结果：

tf.Tensor([ 0. 0. 0. 0. 0. 0. 0. 0. -1. -1. -1. -1. 0. 0. 0. -1.], shape=(16,), dtype=float32)

我试着对我的Q网络的形状和深度进行实验，在我看来，错误中的[10]与我的Q网络的形状有关。将其图层参数设置为（4，）会产生以下错误：

tensorflow.python.framework.errors_impl.InvalidArgumentError: In[0] is not a matrix. Instead it has shape [4] [Op:MatMul]

Tags： of self layer is tf step 错误 train

1条回答

网友

1楼 · 发布于 2024-07-02 04:44:38

根据错误消息中的关键字矩阵，我假设TF需要一个二维张量，而不是一维张量。你知道吗

我建议将层参数设置为(4, 1)（或(1, 4)）。你知道吗

我将试着用它来验证我的答案。你知道吗

Tensorflow 2.0 DQN代理问题与自定义环境

相关问题更多 >

编程相关推荐

热门问题

热门文章

Tensorflow 2.0 DQN代理问题与自定义环境

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >