databricks/databricks-cli

Misalignment between Databricks CLI and Databricks APIs - importing generic file to workspace

luigibrancati opened this issue · 1 comments

We use Databricks on Azure and have a few init scripts that are automatically deployed using Azure releases.

Following this announcement about vulnerabilities in init scripts, we decided to move our init scripts (bash scripts) from the DBFS to the Workspace. While doing so, we encountered a few issues:

  • There's no task on Azure releases to deploy anything but notebooks to the Databricks Workspace
  • The Databricks CLI command databricks workspace import can import only notebooks, since it requires the language option

We solved this falling back to using curl and directly calling the Databricks APIs. The API requires just a base64 string and the language field is optional.

Example code

# script.sh - Example
#!/bin/bash
pip install numpy

# Databricks CLI command - doesn't work
databricks workspace import ./script.sh /InitScripts/script.sh --language PYTHON --profile AZDO --overwrite

# Databricks API - works
export encoded=$(base64 ./script.sh -w 0)
curl --location '<workspace>/api/2.0/workspace/import' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <token>' \
--data '{"format": "AUTO", "path": "/InitScripts/script.sh", "content": "'$encoded'", "overwrite": "true"}'

I think I can open a PR if this isn't intentional