Databricks客户端SDK,带有用于Databricks REST api的命令行客户端

pydbr的Python项目详细描述


pydbr公司

用于Python的Databricks客户端SDK,具有用于Databricks REST api的命令行接口。在

{:toc}

简介

Pydbr(Python Databricks的缩写)包提供了Python SDK for Databricks REST API:

  • dbfs公司
  • 工作区
  • 工作
  • 运行

该软件包还附带了一个有用的CLI,这可能对自动化非常有帮助。在

安装

$ pip install pydbr

Databricks命令行界面

Databricks命令行客户机提供了在命令行与Databricks集群交互的便捷方式。这种方法在自动化任务中非常流行,比如DevOps管道或第三方工作流管理器。在

您可以使用方便的shell命令pydbr调用Databricks CLI:

^{pr2}$

或者使用python模块:

$ python -m pydbr.cli --help

要连接到Databricks集群,可以在命令行中提供参数:

  • --bearer-token
  • --url
  • --cluster-id

或者,可以定义环境变量。命令行参数优先。在

exportDATABRICKS_URL='https://westeurope.azuredatabricks.net/'exportDATABRICKS_BEARER_TOKEN='dapixyz89u9ufsdfd0'exportDATABRICKS_CLUSTER_ID='1234-456778-abc234'exportDATABRICKS_ORG_ID='87287878293983984'

DBFS

列出DBFS项

# List items on DBFS
pydbr dbfs ls --json-indent 3 FileStore/movielens
[{"path": "/FileStore/movielens/ml-latest-small",
      "is_dir": true,
      "file_size": 0,
      "is_file": false,
      "human_size": "0 B"}]

从DBFS下载文件

# Download a file and print to STDOUT
pydbr dbfs get ml-latest-small/movies.csv

从DBFS下载目录

# Download recursively entire directory and store locally
pydbr dbfs get -o ml-local ml-latest-small

工作区

Databricks工作区包含笔记本和其他项目。在

列出工作区

##################### List workspace# Default path is root - '/'
$ pydbr workspace ls
# auto-add leading '/'
$ pydbr workspace ls 'Users'# Space-indentend json output with number of spaces
$ pydbr workspace --json-indent 4 ls
# Custom indent string
$ pydbr workspace ls --json-indent='>'

从Databricks工作区导出项

###################### Export workspace items# Export everything in source format using defaults: format=SOURCE, path=/
pydbr workspace export -o ./.dev/export
# Export everything in DBC format
pydbr workspace export -f DBC -o ./.dev/export.
# When path is folder, export is recursive
pydbr workspace export -o ./.dev/export-utils 'Utils'# Export single ITEM
pydbr workspace export -o ./.dev/GetML 'Utils/Download MovieLens.py'

运行

此命令组实现^{} Databricks REST API。在

提交笔记本

实现:https://docs.databricks.com/dev-tools/api/latest/jobs.html#runs-submit

$ pydbr runs submit "Utils/Download MovieLens"
{"run_id": 4}

您可以使用runs get检索作业信息:

$ pydbr runs get 4 -i 3

如果需要传递参数,请使用--parameters-p选项并指定JSON文本。在

$ pydbr runs submit -p '{"run_tag":"20250103"}'"Utils/Download MovieLens"

您也可以参考JSON文件中的参数:

$ pydbr runs submit -p '@params.json'"Utils/Download MovieLens"

您可以使用笔记本中的参数,也可以在运行元数据中查看这些参数:

pydbr runs get-output -i 38
{"notebook_output":{"result":"Downloaded files (tag: 20250103): README.txt, links.csv, movies.csv, ratings.csv, tags.csv","truncated":false},"error":null,"metadata":{"job_id":8,"run_id":8,"creator_user_name":"your.name@gmail.com","number_in_job":1,"original_attempt_run_id":null,"state":{"life_cycle_state":"TERMINATED","result_state":"SUCCESS","state_message":""},"schedule":null,"task":{"notebook_task":{"notebook_path":"/Utils/Download MovieLens","base_parameters":{"run_tag":"20250103"}}},"cluster_spec":{"existing_cluster_id":"xxxx-yyyyyy-zzzzzz"},"cluster_instance":{"cluster_id":"xxxx-yyyyyy-zzzzzzzz","spark_context_id":"8734983498349834"},"overriding_parameters":null,"start_time":1592067357734,"setup_duration":0,"execution_duration":11000,"cleanup_duration":0,"trigger":null,"run_name":"pydbr-1592067355","run_page_url":"https://westeurope.azuredatabricks.net/?o=89349849834#job/8/run/1","run_type":"SUBMIT_RUN"}}

获取运行元数据

实现:Databricks REST runs/get

$ pydbr runs get -i 36
{"job_id":6,"run_id":6,"creator_user_name":"your.name@gmail.com","number_in_job":1,"original_attempt_run_id":null,"state":{"life_cycle_state":"TERMINATED","result_state":"SUCCESS","state_message":""},"schedule":null,"task":{"notebook_task":{"notebook_path":"/Utils/Download MovieLens"}},"cluster_spec":{"existing_cluster_id":"xxxx-yyyyy-zzzzzz"},"cluster_instance":{"cluster_id":"xxxx-yyyyy-zzzzzz","spark_context_id":"783487348734873873"},"overriding_parameters":null,"start_time":1592062497162,"setup_duration":0,"execution_duration":11000,"cleanup_duration":0,"trigger":null,"run_name":"pydbr-1592062494","run_page_url":"https://westeurope.azuredatabricks.net/?o=398348734873487#job/6/run/1","run_type":"SUBMIT_RUN"}

列表运行

实现:Databricks REST runs/list

$ pydbr runs ls

要仅获取特定作业的运行:

^{pr21}$
{"runs":[{"job_id":4,"run_id":4,"creator_user_name":"your.name@gmail.com","number_in_job":1,"original_attempt_run_id":null,"state":{"life_cycle_state":"PENDING","state_message":""},"schedule":null,"task":{"notebook_task":{"notebook_path":"/Utils/Download MovieLens"}},"cluster_spec":{"existing_cluster_id":"xxxxx-yyyy-zzzzzzz"},"cluster_instance":{"cluster_id":"xxxxx-yyyy-zzzzzzz"},"overriding_parameters":null,"start_time":1592058826123,"setup_duration":0,"execution_duration":0,"cleanup_duration":0,"trigger":null,"run_name":"pydbr-1592058823","run_page_url":"https://westeurope.azuredatabricks.net/?o=abcdefghasdf#job/4/run/1","run_type":"SUBMIT_RUN"}],"has_more":false}

导出运行

实现:Databricks REST runs/export

$ pydbr runs export --content-only 4 > .dev/run-view.html

获取运行输出

实现:Databricks REST runs/get-output

$ pydbr runs get-output -i 36
{"notebook_output":{"result":"Downloaded files: README.txt, links.csv, movies.csv, ratings.csv, tags.csv","truncated":false},"error":null,"metadata":{"job_id":5,"run_id":5,"creator_user_name":"your.name@gmail.com","number_in_job":1,"original_attempt_run_id":null,"state":{"life_cycle_state":"TERMINATED","result_state":"SUCCESS","state_message":""},"schedule":null,"task":{"notebook_task":{"notebook_path":"/Utils/Download MovieLens"}},"cluster_spec":{"existing_cluster_id":"xxxx-yyyyy-zzzzzzz"},"cluster_instance":{"cluster_id":"xxxx-yyyyy-zzzzzzz","spark_context_id":"8973498743973498"},"overriding_parameters":null,"start_time":1592062147101,"setup_duration":1000,"execution_duration":11000,"cleanup_duration":0,"trigger":null,"run_name":"pydbr-1592062135","run_page_url":"https://westeurope.azuredatabricks.net/?o=89798374987987#job/5/run/1","run_type":"SUBMIT_RUN"}}

要仅获取退出输出:

$ pydbr runs get-output -r 6
Downloaded files: README.txt, links.csv, movies.csv, ratings.csv, tags.csv

用于Databricks REST api的Python客户端SDK

要实现自己的databricksretapi客户机,可以使用Python客户机SDK for Databricks restapi。在

创建Databricks连接

# Get Databricks workspace connectiondbc=pydbr.connect(bearer_token='dapixyzabcd09rasdf',url='https://westeurope.azuredatabricks.net')

DBFS

# Get list of items at path /FileStoredbc.dbfs.ls('/FileStore')# Check if file or directory existsdbc.dbfs.exists('/path/to/heaven')# Make a directory and it's parentsdbc.dbfs.mkdirs('/path/to/heaven')# Delete a directory recusivelydbc.dbfs.rm('/path',recursive=True)# Download file block starting 1024 with size 2048dbc.dbfs.read('/data/movies.csv',1024,2048)# Download entire filedbc.dbfs.read_all('/data/movies.csv')

数据库工作区
# List root workspace directorydbc.workspace.ls('/')# Check if workspace item existsdbc.workspace.exists('/explore')# Check if workspace item is a directorydbc.workspace.is_directory('/')# Export notebook in default (SOURCE) formatdbc.workspace.export('/my_notebook')# Export notebook in HTML formatdbc.workspace.export('/my_notebook','HTML')

生成和发布

^{pr31}$

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
java如何在mongodb中获取用户有权访问的数据库列表?   基于契约和类不变量的java设计   java我的代码有什么问题,似乎是正确的,但事实并非如此   java Android初学者:布局按钮和文本   400错误Paypal令牌API与Java(HttpURLConnection)   为什么Java从socket中随机读取数据,而不是整个消息?   如果我调用scanner,我会扫描两次。先是下一个,然后是扫描仪。下一个   如果消息发送失败,java ActiveMQ/JMS不重试   java有没有类似于dynaTrace的开源框架?   java Android:获取zip中的文件数(使用存储卷/存储访问框架)   java无法将流图像解码为片段   java如何修复Jenkins插件中的“此位置的预期stackmap帧”   java如何使用javac编译器编译AndroidManifest。xml文件?