如何在TensorFlow中使用分布式DNN培训？

网友

1楼 · 编辑于 2024-05-11 20:13:39

我们花了几个月的时间，但今天标志着首字母distributed TensorFlow runtime的发布。这包括对多台机器的支持，每个机器都有多个gpu，通信由gRPC提供。在

当前版本包括必要的后端组件，以便您可以手动组装集群并从客户端程序连接到它。更多详细信息请参见readme。在

网友

2楼 · 编辑于 2024-05-11 20:13:39

更新

你可能已经注意到了。Tensorflow已经支持分布式DNN训练了一段时间。详情请参考其官方网站。在

=================================================================================

上一个

不，它还不支持分布式培训，这有点令人失望。但我认为从单机扩展到多机并不困难。与其他开源库（如Caffe）相比，TF的数据图结构更适合跨机器任务。在

网友

3楼 · 编辑于 2024-05-11 20:13:39

更新：

该版本于2016年2月26日发布，由合著者Derek Murray在原版本here中宣布，并使用gRPC进行进程间通信。在

上一页：

在上述更新之前，TensorFlow的分布式实现尚未发布。支持分布式实现是this issue的主题，其中合著者Vijay Vasudevan wrote：

we are working on making a distributed implementation available, it's currently not in the initial release

杰夫·迪恩后来提供了an update：

Our current internal distributed extensions are somewhat entangled with Google internal infrastructure, which is why we released the single-machine version first. The code is not yet in GitHub, because it has dependencies on other parts of the Google code base at the moment, most of which have been trimmed, but there are some remaining ones.
We realize that distributed support is really important, and it's one of the top features we're prioritizing at the moment.

更新

上一个

相关问题更多 >

编程相关推荐

热门问题

热门文章