Python torchreinforce包_程序模块 - PyPI

强化算法的pythonic实现，使用起来很有趣

torchreinforce的Python项目详细描述

torchreinforce

强化算法的pythonic实现，使用起来非常有趣

安装

您可以像安装任何其他python包一样使用pip安装它

pip install torchreinforce

快速启动

为了在模型中使用增强算法，您只需要做两件事：

使用ReinforceModule类作为基类
用@ReinforceModule.forward

forward

就这样

classModel(ReinforceModule):def__init__(self,**kwargs):super(Model,self).__init__(**kwargs)self.net=torch.nn.Sequential(torch.nn.Linear(20,128),torch.nn.ReLU(),torch.nn.Linear(128,2),torch.nn.Softmax(dim=-1),)@ReinforceModule.forwarddefforward(self,x):returnself.net(x)

您的模型现在将输出ReinforceOutput对象。

此对象有两个重要功能

get()
reward(value)

您可以使用output.get()获取覆盖分布的实际示例，并使用output.reward(value)设置特定输出的奖励。

作为net您的模型，您必须做类似的事情

action=net(observation)observation,reward,done,info=env.step(action.get())action.reward(reward)

等等，你刚才说的是配送吗？

是的！当强化算法声明模型的输出将用作概率分布函数的参数时。

实际上，您可以使用您想要的任何概率分布，ReinforceModule构造函数确实接受以下参数：

gamma强化算法的gamma参数（默认值：Categorical）
distribution每ReinforceDistribution或pytorch.distributions分布（默认值：0.99）

像那样

net=Model(distribution=torch.distributions.Beta,gamma=0.99)

请记住，修饰的forward(x)输出的输出将用作distribution的参数。如果distribution需要多个参数，只需返回一个列表。

我为发行版增加了在testing中有deterministic行为的可能性，并且我只为Categorical发行版实现了它，如果您想实现自己的确定性逻辑，请检查文件distributions/categorical.py，这非常简单

例如，如果您想使用torch.distributions.Beta发行版，则需要执行类似于

classModel(ReinforceModule):def__init__(self,**kwargs):super(Model,self).__init__(**kwargs)...@ReinforceModule.forwarddefforward(self,x):return[self.net1(x),self.net2(x)]# the Beta distribution accepts two parametersnet=Model(distribution=torch.distributions.Beta,gamma=0.99)action=net(inp)env.step(action.get())

很好！训练怎么样？

您可以通过调用ReinforceModule的loss()函数来计算加固损失，而不是像对待任何其他pytorch损失函数那样对待它

net=...optmizer=...whiletraining:net.reset()forsteps:....loss=net.loss(normalize=True)optimizer.zero_grad()loss.backward()optmizer.step()

你必须在每集开始前ReinforceModule调用reset()函数。如果要规范化奖励，还可以将参数normalize传递给loss()。

组合在一起

完整的示例如下：

classModel(ReinforceModule):def__init__(self,**kwargs):super(Model,self).__init__(**kwargs)self.net=torch.nn.Sequential(torch.nn.Linear(4,128),torch.nn.ReLU(),torch.nn.Linear(128,2),torch.nn.Softmax(dim=-1),)@ReinforceModule.forwarddefforward(self,x):returnself.net(x)env=gym.make('CartPole-v0')net=Model()optimizer=torch.optim.Adam(net.parameters(),lr=0.001)foriinrange(EPISODES):done=Falsenet.reset()observation=env.reset()whilenotdone:action=net(torch.tensor(observation,dtype=torch.float32))observation,reward,done,info=env.step(action.get())action.reward(reward)loss=net.loss(normalize=False)optimizer.zero_grad()loss.backward()optimizer.step()

您可以在examples/文件夹中找到一个正在运行的示例。

欢迎加入QQ群-->： 979659372

推荐PyPI第三方库

导航栏
项目描述
版本历史
下载文件
项目链接
首页
标签
许可证: BSD许可证（BSD 3条款）
作者信息:: 暂无
维护者
galatolo
最新PyPI项目
italian_vip_says
UFx
vofs
fake_item_generator
NerEva
django-monologue
fio_product_attribute_strict
climailsystem
pyshape
tbb-devel
npy-append-arra
anthill.tal.macrorenderer
odoo11-addon-stock-a
uuuu
contextil
fyl_nester
appomatic_renderable
teacher
chuletas
slackbot_ce
最新Python常见问题
为什么在使用strptime时会出现未进行转换的数据错误？
为什么在使用strptim时会出现这个datetime日期错误
为什么在使用StyleFrame时索引列的标题不显示sf.至excel()?
为什么在使用sum（）函数时会发生“int”对象不可调用的错误？
为什么在使用sympy.dsolve时会得到“'list'对象没有属性'func'”？
为什么在使用tabla时会得到一个空的数据帧？
为什么在使用tensorboard时需要add_graph（）的第二个参数？
为什么在使用TensorFlow Lite转换YOLOv4时，推断时间/大小没有改进？有什么可能的改进吗？
为什么在使用Tensorflow加载训练批时会出现内存泄漏？
为什么在使用tensorflow时会收到警告/错误（使用函数API，但未实现错误）
为什么在使用tetpyclient发出POST请求时出现403错误？
为什么在使用TextBlob时会出现HTTP错误？
为什么在使用TFIDF时出现错误“IndexError:list index out of range”pyspark.ml.feature？
为什么在使用timedelta格式化之后，我在python中的日期是错误的？
为什么在使用timeit或exec函数时，函数中的变量不会在提供的全局命名空间中搜索？

torchreinforce 0.1.0

torchreinforce的Python项目详细描述

torchreinforce

安装

快速启动

等等，你刚才说的是配送吗？

很好！训练怎么样？

组合在一起

推荐PyPI第三方库

LogrusFormatter

django-throttling

django-rest-jwt-sso

umakit

csu-radartools

insolater

pholcidae

boxb

django-easyfilters-ex

Pleisthenes

django-filepreviewfields

cleanm

odoo10-addon-account-banking-mandate-sale

leap

pure-transport

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

torchreinforce 0.1.0

torchreinforce的Python项目详细描述

torchreinforce

安装

快速启动

等等，你刚才说的是配送吗？

很好！训练怎么样？

组合在一起

推荐PyPI第三方库

LogrusFormatter

django-throttling

django-rest-jwt-sso

umakit

csu-radartools

insolater

pholcidae

boxb

django-easyfilters-ex

Pleisthenes

django-filepreviewfields

cleanm

odoo10-addon-account-banking-mandate-sale

leap

pure-transport

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签