This guide provides steps to run an RNA_SEQ Snakemake file for performing differential gene expression analysis when samples have no replicates.
Create a project folder and assign it a meaningful Project_ID.
Copy the following files into the project folder:
Snakefile
DEGSeq_no_replicate_final.R
create_combinations.R
config.yaml
Master_file.txt
Inside the project folder, create a sub-folder named 1_Data
.
Copy the sample files into the 1_Data
folder. Ensure that you replace hyphens (-) with underscores (_) in file names. For example, Tumor-1_R1.fq.gz
should be renamed to Tumor_1_R1.fq.gz
.
Create a file named Master_file.txt
in the project folder. This file should specify the combinations and replicates. Refer to the example file provided for better clarity.
Utilize the config.yaml
file to add any additional information required for the workflow.
# Enter organism name (Scientific name)
org: "Mus musculus"
# Enter Kegg organism code
org_code: "mmu"
# Specify Number of threads
threads: "15"
# Specify Combinations using "+" between combinations
combinations: "Tumor_Lung + Tumor_Liver + Lung_Liver"
# Reference Assembly version (Indexing command provided below)
reference: "<path/to/indexed/reference/folder>"
STAR --runMode genomeGenerate --genomeDir {index_dir_name} --genomeFastaFiles {path to ".fasta" file} --sjdbGTFfile {path to ".gtf" file} --sjdbOverhang 100 --runThreadN 10
Navigate to the project folder in your terminal/command prompt.
Type the following command in the terminal:
snakemake --configfile=config.yaml --cores 5