Version 13 of the bioinformatic pipeline for SARS-CoV-2 sequence analysis used at the Folkehelseinstituttet
Docker-based solution for sequence analysis of SARS-CoV-2 Illumina samples
git clone https://github.com/folkehelseinstituttet/FHI_SC2_Pipeline_Illumina/
cd FHI_SC2_Pipeline_Illumina
docker build -t garcianacho/fhisc2:Illumina .
ArticV4:
docker run -it --rm -v $(pwd):/home/docker/Fastq garcianacho/fhisc2:Illumina SARS-CoV-2_Illumina_Docker_V13.sh ArticV4
ArticV3:
docker run -it --rm -v $(pwd):/home/docker/Fastq garcianacho/fhisc2:Illumina SARS-CoV-2_Illumina_Docker_V13.sh ArticV3
Note that older versions of docker might require the flag --privileged and that multiuser systems might require the flag -u 1000 to run
The script expects the following folder structure where the fastq.gz files are placed inside independent folders for each Sample
./ExpXX |-ExperimentXX.xlsx |-Sample1 |-Sample1_SX_LXXXX_R1.fastq.gz |-Sample1_SX_LXXXX_R2.fastq.gz |-Sample2 |-Sample2_SX_LXXXX_R1.fastq.gz |-Sample2_SX_LXXXX_R2.fastq.gz |-Sample3 |-Sample2_SX_LXXXX_R1.fastq.gz |-Sample2_SX_LXXXX_R2.fastq.gz |-...
The script also expects a .xlsx file, that contains information about the position of the samples on a 96-well-plate and the DNA concentration (alternatively this column can be used for the Ct-values). If the file is not properly formated the script will run without errors but the Quality-control plot will not be generated or it will contain errors. Note that the script takes the name of the experiment from the name of the xlsx file. If the file is not found the names of the output files might be incorrect. It is possible to download a template of the xlsx file here
👉 (V13)-Identification of recombinants (see Precfinder for details)
👉 (V13)-Identification of contaminants (see Precfinder for details)
-Summary including mutations found, pangolin lineage, number of reads, coverage, depth, etc...
-Bam files
-Consensus sequences
-Aligned consensus sequences
-Consensus nucleotide sequence for gene S
-Indels and frameshift identification
-Quality-control plot for the plate to detect possible contaminations
-Phylogenetic-tree plot of the samples
-Noise during variant calling across the genome
-Quality-control for contaminations/low-quality samples
-Amplicon efficacy of the selected primer-set for all the samples
This pipeline is based on the FHI's base docker image which bundles all linux-packages required by the bioinformatic tools plus R v4.1.1. On top of the base image lays a second docker image containing all bioinformatic tools required (e.g. Tanoti, nextclade, ivar, etc). The final docker image is based on the bioinformatic-image plus the Scripts and CommonFiles required to run.
If you want, you can rebuild the two images using the Dockerfiles located on the fhibase and fhibaseillumina folders.
Note that rebuilding the images can lead to broken dependencies since they used public repositories.