Using mamba 0.9.2 (or conda 4.8.4) install the following:
mamba create -n ptera_v2 \
python=3.9.6 snakemake=6.7.0 peppy=0.31.1 sra-tools=2.11.0 tabix=1.11 \
r-tidyverse=1.3.1 r-tidymodels=0.1.3 r-sqldf=0.4_11 \
xsv=0.13.0 bioconductor-deseq2=1.34.0 bioconductor-preprocesscore=1.54.0 \
bioconductor-edaseq=2.26.0 samtools=1.14 bioconductor-eisar-1.6.0 bioconductor-bsgenome=1.62.0 \
r-wgcna=1.69
Make sure you run pre-install the macs env 1 time via basilisk.
basilisk::basiliskStart(MACSr:::env_macs)
Singularity must also be installed and accessible in this environment. Depending on your system this can be done via conda. For HPCs admin privileges may be required.
Other dependencies are handled at runtime by singularity.
Accessions are included in the pepfiles for each subworkflow. The primary data (DGRP RNA-seq) was generated by downloading all metadata for PRJNA483441 and processing the metadata file with a custom R script (see workflow/scripts).
Other data were similarly pulled from SRA/GEO.
This pipeline requires a resources
folder which should be downloaded and placed in the
pipeline directory (level with config/
for example).
New data can be processed in subworkflows. Each sample table should resemble
the SRA run selector export. Recommended to clean this table appropriately with a script
that can be saved in the workflow/scripts
directory of each subworkflow.
Each table should have at a minimum the following columns:
sample_name
LibraryLayout
Experiment
Run
Library
ChIP-seq data should additionally have an input
column.
Note that by default RUN_TYPE=TEST
for the resource intensive workflows. This is useful for testing individual pipeline
components (i.e. subworkflows or the initial portions of the main workflow), but due to
the difficulty of making sure equivalent strain samples are always processed in the
test dataset for each subworkflow, test mode is not guaranteed to complete the full workflow
(main wf + subworkflows). Instead use RUN_TYPE=FULL
.
snakemake --profile <your profile> --use-conda --use-singularity --use-conda -j 999 -kp \
--config RUN_TYPE=FULL -n
If disk space is at a premium, consider running with --prioritize salmon_quant_se_vanilla
to enforce creation of the quant output and deletion of temp files for each sample as fast as possible.