/docker-retina-databricks

Docker Containers that work on Databricks clusters

Primary LanguageDockerfileMIT LicenseMIT

docker-retina-databricks

Docker images for use with Databricks Container Services

This repo contains the code used to create a custom containers with full support for:

  • Python
  • R
    • with MRAN to lock package versions by date
    • with littler for simpler command-line installation of R packages
  • Scala
  • Java and Jar files
  • DBFS mounts
  • ssh

These containers are built upon ubuntu, and try to use the latest versions of dependencies that will work with the Databricks Runtime 6.x. Note that the "ML" versions of the databricks runtime do not currently work with custom docker containers such as these.

Not yet implemented:

  • Ganglia

Images:

  • retina/databricks-minimal installs just the basics to be able to run notebooks. For Python, this includes pandas and numpy. For R, it includes the tidyverse.
  • retina/databricks-standard adds standard dependencies aimed at bringing parity with the out-of-the-box Databricks Runtime 6.x

Notes:

  • This builds upon the Databricks example containers in order to standardize multi-language support (install R by default), and use more current dependencies.
  • when installing Python packages in child containers, be sure to use conda activate $DATABRICKS_ROOT_CONDA_ENV in your Dockerfile
  • Spark 2.x is not compatible with Python 3.8, so these images use Python 3.7