This snakemake workflow includes two workflows, which are linked together. The first takes 1 or more "run" files (R1 and R2), which is then demultiplexed into multiple files using a metadata file which links each sample ID to it's unique barcode. Once the files have been demultiplexed, these individual files are processed in parallel through a sequence of processing steps.
- The first workflow is executed by the Snakefile
demultiplex_Snakefile
, and defines it's sample IDs by parsing through the metadata file provided, which includes the one-to-one mapping of sample ID to barcode. - The second workflow defines the first as it's "sub-workflow" and requires that it is completed prior to launching any of the downstream processing steps. This workflow is defined by
Snakefile
and defines the demultiplexing as it's "sub-workflow"
To launch this workflow properly, you must:
- Have miniconda3 installed. If this is not installed, you can install with the following commands.
$ cd ~
$ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
$ bash Miniconda3-latest-Linux-x86_64.sh
- Install snakemake through
conda
into your base environment.
$ conda install -c bioconda -c conda-forge snakemake
Clone this repository into your preferred location.
$ git clone https://github.com/ohsu-cedar-comp-hub/DIDA.git
$ cd DIDA
Once in the directory DIDA
, you must upload the necessary files for demultiplexing to be run properly. Please upload a tab-delimited text file following this format:
sample1 | gtcaNNNNgtcaNNNN |
sample2 | aagtNNNNaagtNNNN |
sample3 | aaccNNNNaaccNNNN |
... | ... |
Important to note is that snakemake defines the samples to run in parallel by parsing through this file, so please only include the sample IDs & barcodes which were run.
Once this file is uploaded, you can write the absolute path to it under the barcodes
header in the file omic_config.yaml
. The omic_config.yaml
file is also where you can alter the following variables which may change depending on your run:
barcodes
- Metadata file which links sample ID to barcoderef_genome
- Reference genome fasta filenum_pcr
- Number of processorsgd_cutoff
gd_Ncutoff
Once you have uploaded your metadata and edited the omic_config.yaml
to your liking, please symbolically link / copy your raw sequencing files to the directory samples/raw
within your working directory.
$ ln -s /path/to/data/* samples/raw
OR
$ cp /path/to/data/* samples/raw
To test that your workflow is set up correctly, you can do a "dry-run" of your snakemake workflow, which will create a Directed Acyclic Graph (DAG) of every job that will be launched in the workflow. If there are any syntactical errors, they will be reported here.
$ snakemake -np --verbose
If the dry-run looks good, then you can launch the workflow with this command:
$ sbatch submit_snakemake.sh