Just Another System for Epityping using NGSs data
Jasen produces results for epidemiological and surveillance purposes. Jasen has been developed for a small set of microbiota (primarily MRSA), but will likely work with any bacteria with a stable cgMLST scheme.
- Singularity
- Nextflow (
curl -s https://get.nextflow.io | bash
)
- Conda
- Singularity Remote Login
git clone --recurse-submodules --single-branch --branch master git@github.com:genomic-medicine-sweden/jasen.git && cd jasen
singularity remote login
Note: The containers that need to be built locally require sudo privileges.
cd container
sudo make build_local_containers
make download_remote_containers
cd ..
Note: The containers will be attempted to be built and/or downloaded as part of
the main Makefile (that is, when running make install
in the main repo
folder), but building them with sudo before like above means you avoid the main
script being stopped in the middle, asking you for the sudo password, when it
comes to this step.
First, make sure you stand in the main jasen folder (so if you cd:ed into the
container
folder before, you need to cd back to the main folder with cd ..
). Then run the install
make rule:
make install
Finally, run checks:
make check
Any errors produced during this step will hinder pipeline execution in unexpected ways.
Source: configs/nextflow.base.config
- Edit the
root
parameter inconfigs/nextflow.base.config
- Edit the
krakenDb
,workDir
andoutdir
parameters inconfigs/nextflow.base.config
- Edit the
runOptions
inconfigs/nextflow.base.config
in order to mount directories to your run
When analysing Nanopore data:
- Edit the
ext.args
for Flye: specify genome size for the organism of interest with flag--genome-size
- Edit the
ext.seqmethod
for Flye depending on the input data - Edit the
ext.args
for Medaka: specify the model with flag-m
. Currently it is set tor941_min_sup_g507
, but one should always set it based on how the data was produced. More about choosing the right model can be found here.
Source: assets/test_data/samplelist.csv
- Edit the read1 and read2 columns in
assets/test_data/samplelist.csv
Source: ~/.bashrc
- Add the export line to
~/.bashrc
- Change
SINGULARITY_TMPDIR
toAPPTAINER_TMPDIR
if you are using apptainer
export SINGULARITY_TMPDIR="/tmp" #or equivalent filepath to tmp dir
Choose between Kraken DB (64GB [Highly recommended]) or MiniKraken DB (8GB). Or customize your own.
wget -O /path/to/kraken_db/krakenstd.tar.gz https://genome-idx.s3.amazonaws.com/kraken/k2_standard_20230314.tar.gz
tar -xf /path/to/kraken_db/krakenstd.tar.gz
wget -O /path/to/kraken_db/krakenmini.tar.gz https://genome-idx.s3.amazonaws.com/kraken/k2_standard_08gb_20230314.tar.gz
tar -xf /path/to/kraken_db/krakenmini.tar.gz
bash /path/to/jasen/assets/mlst_db/update_mlst_db.sh
git clone git@github.com:ryanjameskennedy/jasentool.git && cd jasentool
pip install .
jasentool converge --output_dir /path/to/jasen/assets/tbdb
cd /path/to/jasen/assets/tbdb
tb-profiler create_db --prefix converged_who_fohm_tbdb
tb-profiler load_library converged_who_fohm_tbdb
bgzip -c converged_who_fohm_tbdb.bed > /path/to/jasen/assets/tbprofiler_dbs/bed/converged_who_fohm_tbdb.bed.gz
tabix -p bed /path/to/jasen/assets/tbprofiler_dbs/bed/converged_who_fohm_tbdb.bed.gz
nextflow run main.nf -profile staphylococcus_aureus -config configs/nextflow.base.config --csv assets/test_data/samplelist.csv
Argument type | Options | Required |
---|---|---|
-profile | staphylococcus_aureus/escherichia_coli | True |
-entry | bacterial_default | True |
-config | nextflow.base.config | True |
-resume | NA | False |
--output | user specified | False |
id,platform,read1,read2
p1,illumina,assets/test_data/sequencing_data/saureus_10k/saureus_large_R1_001.fastq.gz,assets/test_data/sequencing_data/saureus_10k/saureus_large_R2_001.fastq.gz
- Kraken2: Species detection.
- Bracken: Combined with Kraken2 for species detection.
- bwa mem: Maps reads to cgMLST loci (demarcated by bed file) in order to estimate genome coverage. Low levels of Intra-species contamination or erroneous mapping is removed using bwa and filtering away the heterozygous mapped bases.
- interquartile range: Calculates evenness of coverage.
- SPAdes: De novo assembly for Ion Torrent.
- SKESA: De novo assembly for Illumina.
- QUAST: Extracts QC data (De novo assembly parameters) from the assembly.
- Flye: De novo assembly for Oxford Nanopore Technologies (ONT).
- Medaka: Creates consensus sequences from ONT data.
- chewBBACA: Calculates cgMLST of extracted alleles decided by schema. Number of missing loci is calculated and used as a QC parameter.
- cgmlst.net: The cgMLST reference schema.
- mlst: Caculates traditional 7-locus MLST.
staphylococcus_aureus
escherichia_coli
klebsiella_pneumoniae
mycobacterium_tuberculosis
- resfinder: Detects antimicrobial resistance genes as well as environmental and chemical resistance genes.
- pointfinder: Combines with resfinder to detect variants.
- virulencefinder: Detects virulence genes.
- amrfinderplus: Detects antimicrobial resistance genes as well as environmental, chemical resistance and virulence genes.
- resfinder_db: Resfinder database.
- pointfinder_db: Pointfinder database.
- virulencefinder_db: Virulencefinder database.
- sourmash: Determine relatedness between samples.
- Bonsai: Visualises jasen outputs.
- graptetree: Visualise phylogenetic relationship using cgmlst data.
- Always run the latest versions of the bioinformatical software.
- Verify you have execution permission for jasens
*.sif
images. - Old Singularity versions may sporadically produce the error
FATAL: could not open image jasen/container/*.sif: image format not recognized!