为dask_作业队列创建本地_目录

2024-06-01 08:51:03 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试在使用NFS存储的HPC系统上运行dask。因此,我希望将dask配置为使用本地存储作为临时空间。每个集群节点都有一个/scratch/文件夹,所有用户都可以写入该文件夹,其中包含将临时文件放入/scratch/<username>/<jobid>/的说明

我有一些代码是这样配置的:

import dask_jobqueue
from distributed import Client

cluster = dask_jobqueue.SLURMCluster(
            queue = 'high',
            cores = 24,
            memory = '60GB',
            walltime = '10:00:00',
            local_directory = '/scratch/<username>/<jobid>/'
)

cluster.scale(1)
client = Client(cluster)

然而,我有一个问题。该目录不存在(既因为我不知道客户端将在哪个节点上,也因为它基于SLURM作业id创建它,该id始终是唯一的),因此我的代码失败:

Process Dask Worker process (from Nanny):
Traceback (most recent call last):
  File "/home/lsterzin/anaconda3/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/home/lsterzin/anaconda3/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/home/lsterzin/anaconda3/lib/python3.7/site-packages/distributed/process.py", line 191, in _run
    target(*args, **kwargs)
  File "/home/lsterzin/anaconda3/lib/python3.7/site-packages/distributed/nanny.py", line 699, in _run
    worker = Worker(**worker_kwargs)
  File "/home/lsterzin/anaconda3/lib/python3.7/site-packages/distributed/worker.py", line 497, in __init__
    self._workspace = WorkSpace(os.path.abspath(local_directory))
  File "/home/lsterzin/anaconda3/lib/python3.7/site-packages/distributed/diskutils.py", line 118, in __init__
    self._init_workspace()
  File "/home/lsterzin/anaconda3/lib/python3.7/site-packages/distributed/diskutils.py", line 124, in _init_workspace
    os.mkdir(self.base_dir)
FileNotFoundError: [Errno 2] No such file or directory: '/scratch/<user>/<jobid>'

如果不知道dask worker将在哪个节点上运行,我就无法创建目录,如果目录不存在,我就无法创建带有dask_jobqueue的集群。解决这个问题的最佳方法是什么


Tags: inpyselfhomelibpackageslinesite
2条回答

我认为这可以通过/scratch/$USER/$SLURM_JOB_ID实现。如果不起作用,可以通过配置文件定义local-directoryhttps://jobqueue.dask.org/en/latest/configuration-setup.html#local-storage

示例配置也可能对您有用: https://jobqueue.dask.org/en/latest/configurations.html

感谢您提出措辞得体的问题@lsterzinger

我在这里提出了一个可能有帮助的解决方案:https://github.com/dask/distributed/pull/3928

我们看看社区怎么说

相关问题 更多 >