为实践者重新开展学习。
easyagents-v1的Python项目详细描述
从业者强化学习(v1α)
状态:在活动开发中,可能会发生中断性更改
easyagents是一个高级强化学习api,用python编写,运行在 OpenAI gym使用 tf-Agents和OpenAI baselines。
如果
- 您正在寻找一种简单易行的方法开始强化学习
- 您已经实现了自己的环境,并希望尝试使用它
- 您需要混合和匹配不同的实现和算法
在科拉布身上试试:
- Cartpole on colab (引言经典的强化学习示例(平衡手推车上的木棍)
- Berater on colab (自定义环境和培训的示例。基于路线问题的健身房环境)
- LineWorld on colab (实施您自己的环境,车间示例)
v1的想法
指导原则
- 轻松训练、评估和调试(您自己的)健身房环境的策略而不是“设计新算法”
- 灵感来自keras:
- 所有算法使用相同的api
- 支持同一算法的不同实现
场景
- 简单的
agent = PpoAgent( "LineWorld-v0" )
agent.train( SingleEpisode() )
agent.train()
agent.save(...)
agent.load(...)
agent.play()
- 高级
agent = PpoAgent( "LineWorld-v0", fc_layers=(500,250,50) )
agent.train( train=[Fast(), ModelCheckPoint(), ReduceLROnPlateau(), TensorBoard()],
play=[JupyterStatistics(), JupyterRender(), Mp4()],
api=[AgentApi()] )
设计理念
- 使用前端/后端体系结构将“公共api”与具体实现分离 (灵感来自scikit learn、matplotlib、keras)
- 可插拔后端
- 可通过回调扩展(受keras启发)用于培训、评估和监控的单独回调类型
- 可预先配置,特定于算法的训练和播放循环
安装
使用pip从pypi安装:
pipinstalleasyagents-v1
词汇
以下是强化学习空间中的术语列表,以口语的方式解释这些解释通常都是正确的,只是想传达一个大致的想法(如果你发现它们是错误的或者缺少一个术语:请让我知道, 此外,列表仅包含实际用于此项目的术语)
term | explanation |
---|---|
action | A game command to be sent to the environment. Depending on the game engine actions can be discrete (like left/reight/up/down buttons or continuous like 'move 11.2 degrees to the right') |
batch | a subset of the training examples. Typically the training examples are split into batches of equal size. |
episode | 1 game played. A sequence of (state,action,reward) from an initial game state until the game ends. |
environment (aka game engine) | The game engine, containing the business logic for your problem. RL algorithms create an instance of the environment and play against it to learn a policy. |
epoch | 1 full training step over all examples. A forward pass followed by a backpropagation for all training examples (batchs). |
iterations | The number of passes needed to process all batches (=#training_examples/batch_size) |
observation (aka game state) | All information needed to represent the current state of the environment. |
optimal policy | A policy that 'always' reaches the maximum number of points. Finding good policies for a game about which we know (almost) nothing else is the goal of reinforcement learning. Real-life algorithms typically don't find an optimal policy, striving for a local optimum. |
policy (aka gaming strategy) | The 'stuff' we want to learn. A policy maps the current game state to an action. Policies can be very dump like one that randomly chooses an arbitrary action, independent of the current game state. Or they can be clever, like an that maximizes the reward over the whole game. |
training example | a state together with the desired output of the neural network. For an actor network thats (state, action), for a value network (state, value). |
不使用EasyAgents如果
- 您希望利用算法的特定于实现的优势
- 你想做分布式或并行强化学习
注意
- 该存储库正在积极开发中,处于早期阶段。 因此,任何事情都可能(可能也应该)改变。
- 如果您在安装或使用easyagents方面有任何困难,请告诉我们。 我们会尽力帮助你的
- python/open source development/reinforcement learning/whatever中的任何想法、帮助、建议、评论等 非常受欢迎。提前多谢了。