- Genome needs to be indexed for BWA, fasta and index is defined in config file
- Fastq files are located in a directory defined in config, it expects a
SAMPLEID1_1.fastq.gz
andSAMPLEID1_2.fastq.gz
file names - All processes and results are stored by
-d
parameter in SnakeMake - All logs are stored under
log/
directory - Dependencies can be added in a conda environment as defined in
envs/environment_varcall.yaml
for running in local mode, or as modules load to be used with--use-envmodules
- The pipeline can be run in a cluster (
--slurm
) if configured properly - Almost all jobs are requesting 10 cores, but can be scaled with
--cores N
parameter
Defined in config.yaml, it requires
samples:
- sampleID1
- sampleID2
fastq_dir: /path/to/fastq/dir
genome: /path/to/genome.fa
bcf_mpileup_param: "params for bcftools mpileup"
bcf_call_param: "params for bcftools call"
bcf_norm_param: "params for bcftools norm"
bcf_filter_param: "params for bcftools filters"
snakemake <params> -d results --configfile config.yaml
- Input files are BCF files from the previous workflow
- All processes and results are stored by
-d
parameter in SnakeMake - All logs are stored under
log/
directory - Dependencies can be added in a conda environment as defined in
envs/environment_trio_filtering.yaml
for running in local mode, or as modules load to be used with--use-envmodules
- The pipeline can be run in a cluster (
--slurm
) if configured properly - Almost all jobs are requesting 1 core, but can be scaled with
--cores N
parameter
Defined in config.yaml, it requires
mother_id: MOM_ID
mother_bcf: /path/to/mother.bcf
father_id: DAD_ID
father_bcf: /path/to/father.bcf
child_id: KID_ID
child_bcf: /path/to/child.bcf
snakemake -s workflow/trio_filtering.smk <params> -d results --configfile config_trio.yaml
(C) Juan Caballero 2024