A DNAnexus app for running the latest Hail on a Spark cluster in JupyterLab. This app was created as an up-to-date alternative to the DXJupyterLab Spark Cluster which only supported Hail 0.2.78 (19/10/2021).
Maintained by Barney Hill (barney.hill@ndph.ox.ac.uk) for usage within the Lindgren group. This app is an unofficial community app - not associated with DNAnexus.
git clone git@github.com:lindgrengroup/hail-on-dnanexus.git
dx build hail-on-dnanexus -f
dx run hail-on-dnanexus
-
After around 5 minutes after running the app the Jupyter Lab link will be accessible through the DNAnexus monitor page. Once initialised the Spark control panel can be accessed from https://${JOBID}.dnanexus.cloud:8081/jobs.
-
Once in the Jupyter lab the following extra code is required to get full read/write functionality: https://discuss.hail.is/t/how-should-i-use-hail-on-the-dnanexus-rap/2277
-
DNAnexus files can be directly accessed with the prefix "file:///mnt/project/".
- Python Script (optional) - app can execute a hail script instead of an interactive notebook.
- Bash Script (optional) - provided string will be executed before main computation, useful for passing batch parameters.
- Hail 0.2.108 (latest as of 18/01/23)
- Spark 3.2.0
- Containerise hail+jupyterlab with automativc builds as below.
- Automatically create latest Hail builds on Spark 3.2.0 for usage in the app.
- Allow non-interactive execution of a Hail Python script.