DQN中的学习速率调度器在稳定的_基线内3

2024-09-24 22:29:01 发布

您现在位置：Python中文网/ 问答频道 /正文

4862

网友

男 | 程序猿一只，喜欢编程写python代码。

我正在使用gym和stable-baselines3进行强化学习的实验，特别是使用山地车（https://gym.openai.com/envs/MountainCar-v0/）的stable-baselines3的DQN实现

我正在尝试实现一个学习速率调度器，每当强化学习模型的奖励值在给定迭代次数内高于某个阈值时，它就会降低学习速率。我尝试了以下方法：

定义模型时，将函数而不是数字传递给learning_rate，因为learning_rate可以调用。然而，它似乎只在第一次迭代中运行，而不会在以后更新学习速率
在策略中作为lr_调度程序传递函数：

    env = gym.make('MountainCar-v0')
    #You can also load other environments like cartpole, MountainCar, Acrobot. Refer to https://gym.openai.com/docs/ for descriptions.
    #For example, if you would like to load Cartpole, just replace the above statement with "env = gym.make('CartPole-v1')".
    
    env = stable_baselines3.common.monitor.Monitor(env, log_dir )
    
    callback = EvalCallback(env,log_path = log_dir, deterministic=True) #For evaluating the performance of the agent periodically and logging the results.
    policy_kwargs = dict(activation_fn=torch.nn.ReLU,
                         net_arch=nn_layers, lr_schedule = lr_schedule_custom)
    
    model = DQN("MlpPolicy", env, policy_kwargs = policy_kwargs)

然而，我得到了错误__init__() got multiple values for argument 'lr_schedule'，尽管文档（https://stable-baselines3.readthedocs.io/en/master/modules/dqn.html）在策略的lr_schedule参数和我在策略中使用的其他参数之间没有任何区别。我该怎么做

非常感谢

Tags： the https env log 速率 policy 策略 kwargs

0条回答

目前没有回答

DQN中的学习速率调度器在稳定的_基线内3

相关问题更多 >

编程相关推荐

热门问题

热门文章

DQN中的学习速率调度器在稳定的_基线内3

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >