b-data/data-science-devcontainers

Request: add basic Java support to run PySpark

dkapitan opened this issue · 5 comments

Context

I am working with FHIR and want to use pathling in my devcontainer. Note I just want the libraries and not the full server. My aim is to setup a 'serverless composable FHIR data stack', see the working paper here.

Pathling needs Java to run, Azul OpenJDK is recommended. Zulu supports various chipsets.

Feature request

Would it make sense to include Java an an option on all of the devcontainers, in such a way that during the build it also looks for the proper version?

My solution

Currently I have just added this in onCreateCommand.sh

# install Azul OpenJDK required for Spark stack, including pathling
sudo apt install gnupg ca-certificates curl
curl -s https://repos.azul.com/azul-repo.key | sudo gpg --dearmor -o /usr/share/keyrings/azul.gpg
echo "deb [signed-by=/usr/share/keyrings/azul.gpg] https://repos.azul.com/zulu/deb stable main" | sudo tee /etc/apt/sources.list.d/zulu.list
sudo apt update
sudo apt install zulu21-jdk

@dkapitan Could you try the default JDK first, i.e. add default-jdk to packages at

// A comma separated list of packages to install
"ghcr.io/rocker-org/devcontainer-features/apt-packages:1": {
"packages": "qpdf"
}
or
// A comma separated list of packages to install
"ghcr.io/rocker-org/devcontainer-features/apt-packages:1": {
"packages": ""
}


This will install OpenJDK; currently version 17 for the regular (Debian based) Data Science Dev Containers and version 11 for the GPU accelerated (Ubuntu-based) Data Science Dev Containers.


Both are supposed to work and IMHO it should say

All variants of the Pathling library require minimum version 11 of a Java Virtual Machine (JVM) to be installed.

at https://pathling.csiro.au/docs/libraries/installation#java-virtual-machine.

P.S.: The following Data Science Dev Containers already have default-jdk installed:

  1. R verse
  2. R geospatial
  3. R qgisprocess
  4. CUDA R verse
  5. CUDA R geospatial
  6. CUDA R qgisprocess

because it is required for package rJava, a low-level R to Java interface.

@benz0li thanks for following up so quickly. I will try your suggestions.

Would it make sense to include Java an an option on all of the devcontainers

Java can be installed at runtime by adding default-jdk using Dev Container Feature apt packages.

in such a way that during the build it also looks for the proper version?

Java versions are expected to be binary backwards-compatible. JDK 17 can run code compiled by JDK 11 or JDK 8.

Solution with apt-packages works.