一个简单的python mpi脚本在集群的特定节点上崩溃,并出现分段错误。在
脚本的主体是:
import mpi4py
mpi4py.rc.threads = False
from mpi4py import MPI
comm = MPI.COMM_WORLD
name=MPI.Get_processor_name()
print("hello world")
print(("name:",name,"my rank is",comm.rank))
我尝试过在单个节点上运行脚本之前加载批处理文件中的所有模块,但这不起作用。sbatch文件如下所示:
^{pr2}$输出的前几行如下所示节点的实际名称被NODENAME替换,INSTITUTE是我工作的地方的占位符:
[NODENAME:24753] *** Process received signal ***
[NODENAME:24753] Signal: Segmentation fault (11)
[NODENAME:24753] Signal code: Address not mapped (1)
[NODENAME:24753] Failing at address: 0x7f68a835a008
[NODENAME:24753] [ 0] /lib64/libpthread.so.0(+0xf7e0)
[0x7f68a7f197e0]
[NODENAME:24753] [ 1] /usr/INSTITUTE/gcc/9.1-pkgs/openmpi-
4.0.1/lib/pmix/mca_gds_ds21.so(pmix_gds_ds21_lock_init+0x124)
[0x7f689d41c184]
[NODENAME:24753] [ 2] /usr/INSTITUTE/gcc/9.1-pkgs/openmpi-
4.0.1/lib/libmca_common_dstore.so.1(pmix_common_dstor_init+0x983)
[0x7f689d20ae43]
我猜这些节点上没有加载模块。在
目前没有回答
相关问题 更多 >
编程相关推荐