Eldorado is designed to streamline the process of basecalling sequencing runs from a ONT PromethION sequencer. It provides a suite of features including pseudo-live basecalling, merging, demultiplexing, and cleanup. Eldorado is designed to be run on a high-performance computing (HPC) cluster and uses the SLURM job scheduler to manage the sequencing runs and takes advantage of the GPU resources available on the cluster.
One of the key features of Eldorado is the scheduler
subtool. The scheduler
is responsible for managing the sequencing runs for each project. It reads the project configuration file and schedules the basecalling, merging, demultiplexing, and cleanup jobs on the cluster. The scheduler
is designed to be run as a Cron job and can be configured to run at regular intervals to process new sequencing runs as they are uploaded to the cluster. The scheduler
provides detailed logging of the basecalling and can be configured to send email notifications when jobs are completed or if an error occurs.
To get started with Eldorado, you need to install the package on your local machine. You can install Eldorado from a local repository or from GitHub. To install Eldorado from a GitHub download the recipe (meta.yaml
) file from the conda_recipe
directory and build the package with conda build
:
-
In the
conda_recipe
directory use themeta.yaml
file to build the package withconda build
:# Build directory BUILD_DIR="/path/to/local/conda/repo/" cd $BUILD_DIR # Download the meta.yaml file curl -O https://raw.githubusercontent.com/MOMA-AUH/eldorado/main/conda_recipe/meta.yaml # Build the package LOCAL_REPO="/path/to/local/conda/repo/" conda build $BUILD_DIR --output-folder $LOCAL_REPO # Clean up conda build purge
-
Create a new conda environment:
conda create -n eldorado-env
-
Activate the conda environment:
conda activate eldorado-env
-
Install the package in the new conda environment:
conda install -c /path/to/local/conda/repo/ eldorado
-
Install the package dependencies:
conda install -c conda-forge -c bioconda -c defaults --file /path/to/local/conda/repo/conda_build_scripts/conda_requirements.txt
-
Activate the conda environment:
conda activate eldorado-env
-
Eldorado is now installed in your local conda environment. You can start using it by following the instructions in the Usage section.
To install an editable version of Eldorado from a local repository, run the following commands:
ENV_NAME="eldorado-env"
git clone https://github.com/MOMA-AUH/eldorado.git
conda create -n $ENV_NAME python=3.10 samtools=1.20 # Note that samtools is required for the merge step
conda activate $ENV_NAME
cd /path/to/eldorado/repository
pip install -e .
To use the scheduler
, you need to provide several command-line arguments:
--root-dir
or-r
: The root directory of your project.--models-dir
or-m
: The path to the models directory.--project-config
or-c
: The path to the project configuration file (.csv).--min-batch-size
or-b
: The minimum batch size (default is 1).--log-file
or-l
: The path to the log file (optional).--mail-user
or-u
: The email address for notifications (optional).--dry-run
or-d
: If set, the scheduler will perform a dry run (optional).
Here is an example of how to use the scheduler
:
python eldorado scheduler \
--root-dir /path/to/root \
--models-dir /path/to/models \
--project-config /path/to/config.csv \
--min-batch-size 10 \
--log-file /path/to/logfile.log \
--mail-user example@example.com
--dry-run
Eldorado is designed to run in three main stages: basecalling, merging, and demultiplexing. The scheduler
is responsible for managing these stages and scheduling the jobs on the cluster. Furthermore the scheduler
handles logging, continous monitoring of lock files and cleanup of temporary directories and files. Each stage works as follows:
The basecalling stage is responsible for running the Dorado basecaller on the sequencing reads. The scheduler
reads the pod5
files from the sequencing run and submits the basecalling of any new files to the job queue. The basecalling is run on the GPU nodes of the cluster.
The merging stage is responsible for merging the basecalled reads from the individual basecalling batches into a single file using samtools
. Before merging the basecalled reads, the scheduler
checks if all pod5
files have been basecalled successfully. If all files have been basecalled, the scheduler
submits the merging job to the job queue.
The demultiplexing stage is responsible for demultiplexing the merged reads into individual samples using dorado demux
. The scheduler
checks that the sample sheet from the sequencing run is available and has barcode
and ´alias´ columns, and that the used kit requires demultiplexing. If the conditions are met, the scheduler
submits the demultiplexing job to the job queue. If not, the scheduler
skips the demultiplexing stage and simply uses the merged reads as the final output.
ElDorado is placed in the following repository on GenomeDK:
/faststorage/project/MomaReference/BACKUP/nanopore/software/conda/repo/
The following directory is used on GDK for the Eldorado execution, logging and configuration files:
/faststorage/project/MomaNanoporeDevelopment/BACKUP/eldoarado/
The folder contains:
config
folder with the configuration fileslogs
folder with the log filesrun_elodarado.sh
script to run Eldorado