Dockerfiles for sequencing data pipeline
- guppy_gpu: base calling of ONT minion raw data with gpu acceleration
- bonito_gpu: base calling ONT minion raw data with gpu acceleration using the bonito base caller.
- c3poa: consensus calling of r2c2 reads
- PyIR: align B and T cell receptor sequences using IgBLAST implemented by PyIR
- longread_stringtie: pipeline for analyzing long reads using stringtie
- ballgown: plotting and differential expression analysis of stringtie results using ballgown
- Run the docker containers on the data folder in the following order: (1) guppy_gpu OR bonito_gpu, (2, if you used the R2C2 pipeline) c3poa, (3) PyIR, (4) longread_stringtie, and (5) ballgown. The images need to be mounted on the sample's path (e.g. ~/path/to/sample/):
- The data needs to be base called using the guppy_gpu OR bonito_gpu (this is separated for basecalling on a gpu cluster; see documentation here for bonito or here for guppy)
- On a SLURM managed cluster after installing the images as charliecloud images (see below) and after basecalling:
- Run the SLURM script 01_run_pipeline.cmd on EACH sample
- After running 02_run_merge_stringtie.cmd, run on all samples using the base folder containing all samples
- Ease of use: dependencies are installed automatically when building the image; after building and testing, the image can be moved between machines/servers
- Reproducibility: once the image is build, behaviour is stable across machines/servers; behaviour does not change when using the image later
- Scalabiliy: test on a local machine/laptop, run on a workstation/high performance computing server
Convert docker image c3poa and export to tar using charliecloud.
ch-docker2tar [DOCKER IMAGE] ~/
Untar image.
ch-tar2dir [CHARLIE CLOUD IMAGE].tar.gz /path/to/destination
Run the image on the server using charliecloud.
ch-run -w /path/to/destination/[IMAGE] -b ~/path/to/data/ -- sh [SOME SCRIPT].sh
Submit image to SLURM manager.
sbatch /path/to/script/[SOME SLURM SCRIPT].cmd
Inspect SLURM queue.
squeue --clusters=<cluster name>
Inspect SLURM run by id.
scontrol --clusters=<cluster name> show jobid=<job id>