Azure Databricks群集初始化脚本安装python控制盘

2024-10-02 00:41:23 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个python脚本,它在databricks中装载一个存储帐户,然后从存储帐户安装一个轮子。我试图将其作为集群初始化脚本运行,但一直失败。我的脚本的形式如下:

#/databricks/python/bin/python
mount_point = "/mnt/...."
configs = {....}
source = "...."
if not any(mount.mountPoint == mount_point for mount in dbutils.fs.mounts()):
  dbutils.fs.mount(source = source, mount_point = mount_point, extra_configs = configs)
dbutils.library.install("dbfs:/mnt/.....")
dbutils.library.restartPython()

当我直接在笔记本中运行它时,它可以工作,但是如果我保存到一个名为dbfs:/databricks/init_scripts/datalakes/init.py的文件并将其用作集群初始化脚本,集群将无法启动,错误消息表示初始化脚本的退出状态为非零。我检查了日志,它似乎是以bash而不是python运行的:

bash: line 1: mount_point: command not found

我尝试从一个名为init.bash的bash脚本运行python脚本,其中包含以下一行:

/databricks/python/bin/python "dbfs:/databricks/init_scripts/datalakes/init.py"

然后,使用init.bash的集群启动失败,日志显示它找不到python文件:

/databricks/python/bin/python: can't open file 'dbfs:/databricks/init_scripts/datalakes/init.py': [Errno 2] No such file or directory

有人能告诉我怎么才能让它工作吗

相关问题:Azure Databricks cluster init script - Install wheel from mounted storage


Tags: py脚本bashsourcebininitscripts集群
1条回答
网友
1楼 · 发布于 2024-10-02 00:41:23

我使用的解决方案是运行一个笔记本,它装载存储并创建一个bash init脚本来安装控制盘。大概是这样的:

mount_point = "/mnt/...."
configs = {....}
source = "...."
if not any(mount.mountPoint == mount_point for mount in dbutils.fs.mounts()):
  dbutils.fs.mount(source = source, mount_point = mount_point, extra_configs = configs)

dbutils.fs.put("dbfs:/databricks/init_scripts/datalakes/init.bash",""" 
        /databricks/python/bin/pip install "../../../dbfs/mnt/package-source/parser-3.0-py3-none-any.whl"""", True)"

相关问题 更多 >

    热门问题