我正在尝试用2 nvidia docker进行分布式学习。当我试着用两个主机时,它没用。我如何解决这个问题?在
我试过这个命令:
horovodrun -np 3 -H localhost:1 -p 12345 python keras_mnist_advanced.py
它起作用了,但当我试着:
^{pr2}$我得到了这个错误:
Launching horovodrun task function was not successful: horovod.run.common.util.network.NoValidAddressesFound: Unable to connect to the horovodrun task service #1 on any of the addresses:{'lo': [('127.0.0.1', 30871)], 'docker0': [('172.17.0.1', 30871)], 'enp0s31f6': [('192.168.0.20', 30871)]}
请查看存储库中提出的以下问题:
1)https://github.com/horovod/horovod/issues/975
2)https://github.com/horovod/horovod/issues/971
相关问题 更多 >
编程相关推荐