用于深度学习应用程序的云资源管理。
dl-cloud的Python项目详细描述
用于深度学习的云实用程序
一个超轻量级的云管理工具,设计时考虑了深度学习应用程序。
基于这样一种信念:管理云资源应该像:
importcloudcloud.connect()train_my_network()cloud.down()
我们欢迎所有的贡献、建议和用例。通过github或team@for.ai向我们提供想法!
内容
快速启动
安装:
有点稳定:
sudopipinstalldl-cloud
出血边缘:
gitclonegit@github.com:for-ai/cloud.gitsudopipinstall-ecloud
配置:
请参阅configs/cloud.toml-*
以获取有关如何对每个提供商(google cloud、aws ec2和azure)进行身份验证的说明。
将完成的配置文件(名为cloud.toml
)放在根目录/
或$HOME
中。否则,请在$CLOUD_CFG
中提供文件的完整路径。
用法:
gpu
importcloudcloud.connect()# gpu instances have a dedicated GPU so we don't need to worry# about preemption or acquiring/releasing accelerators online.whileTrue:# train your model or w/ecloud.down()# stop the instance (does not delete instance)
TPU(仅在GCP上)
importcloudcloud.connect()tpu=cloud.instance.tpu.get(preemptible=True)# acquire an acceleratorwhileTrue:ifnottpu.usable:tpu.delete(background=True)# release the accelerator in the backgroundtpu=cloud.instance.tpu.get(preemptible=True)# acquire a new acceleratorelse:# train your model or w/ecloud.down()# release all resources, then stop the instance (does not delete instance)
文档
cloud.connect()
获取/创建一个cloud.Instance
对象并将其设置为cloud.instance
。
returns | desc. |
---|---|
cloud_env | a cloud.Instance. |
cloud.down()
调用cloud.instance.down()
。
cloud.delete(confirm=true)
调用cloud.instance.delete(confirm)
。
云资源
获取/创建一个cloud.Instance
对象并将其设置为cloud.instance
。
properties | desc. |
---|---|
^{ | str, name of the instance |
^{ | bool, whether this resource is usable |
methods | desc. |
^{ | start an existing stopped resource |
^{ | stop the resource. Note: this should not necessarily delete this resource |
^{ | delete this resource |
云实例(资源)
一个对象,表示具有一组可分配/释放的资源的云实例。
properties | desc. |
---|---|
^{ | list of ResourceManagers |
methods | desc. |
^{ | stop this instance and optionally delete all managed resources |
^{ | delete this instance with optional user confirmation |
cloud.resourcemanager
用于管理cloud.Resources
的创建和维护的类。
properties | desc. |
---|---|
^{ | ^{ |
^{ | ^{ |
^{ | list of ^{ |
methods | desc. |
^{ | ^{ |
^{ | |
^{ | add an existing resource to this manager |
^{ | remove an existing resource from this manager |
亚马逊EC2
cloud.awsInstance(实例)
aws ec2实例的cloud.Instance
对象。
天青
cloud.azureInstance(实例)
Microsoft azure实例的cloud.Instance
对象。
谷歌云
我们的gcpinstance要求您的实例已经安装了gcloud
,并且经过了正确的身份验证,以便gcloud alpha compute tpus create test_name
可以正常运行。
cloud.gcpinstance(实例)
google云实例的cloud.Instance
对象。
properties | desc. |
---|---|
^{ | ^{ |
^{ | list of owned ^{ |
methods | desc. |
^{ | ^{ |
^{ |
cloud.tpu(资源)
TPU加速器的资源类。
properties | desc. |
---|---|
^{ | str, IP address of the TPU |
^{ | bool, whether this TPU is preemptible or not |
^{ | dict {str: str}, properties of this TPU |
methods | desc. |
^{ | start this TPU |
^{ | stop this TPU |
^{ | delete this TPU |
cloud.tpumanager(资源管理器)
TPU加速器的ResourceManager类。
properties | desc. |
---|---|
^{ | list of str, names of the managed TPUs |
^{ | list of str, ips of the managed TPUs |
methods | desc. |
^{ | ^{ |
^{ | |
^{ | delete all managed TPUs with unhealthy states |
^{ | get an available TPU, or create one using ^{ |
^{ | allocate and manage a new instance of ^{ |