As this workflow is based on the workflow management system snakemake and conda. We strongly recommend installing miniconda3 with python3. For specific installation methods and usage methods of RNApipe, please refer to the RNApipe_Documentation.pdf.
Clone the repository:
git clone https://github.com/ywu019/RNApipe.git
Create the environment:
conda env create -n RNApipe -f envs/envs.yaml
Activate the environment:
conda activate RNApipe
Several input files are required in order to run the workflow, a genome file (.fa), an annotation file (.gff/.gtf) and compressed sequencing files (.fastq.gz).
File type | Description |
---|---|
genome.fa | user-provided genome file containing the genome sequence |
annotation.gtf | user-provided annotation file with genomic features |
.fastq.gz | user-provided compressed sequencing files |
config.yaml | configuration file to customize the workflow |
samples.tsv | sample file describing the relation between the input fastq files |
We recommend retrieving both the genome and the annotation files for your organism from National Center for Biotechnology Information (NCBI) Note: if you use custom annotation files, ensure that you adhere to the gtf standard
These are the input files provided by the user. Both single end and paired end data is supported. Note: Please ensure that you compress your files in .gz format and .fastq.gz (root_1_R1.fastq.gz if paired end data /root_1.fastq.gz if single end data )
Modify the metafile describing your data configs/samples.tsv .
condition | replicate |
---|---|
root | 1 |
root | 2 |
root | 3 |
shoot | 1 |
shoot | 2 |
shoot | 3 |
Customize the workflow based on your need in configs/config.yaml .It contains the following variables:
- PROJECT: Project name
- READSPATH: The path to fastq files
- SAMPLES: configs/samples.tsv
- END: sequencing paired-end or single-end
- OUTPUTPATH: Thw path for final outputs
- GENOME: The path of genome files
- ANNOTATION: The path of annotation files
- CONTROL: Control group in comparison
- TREAT: Processing groups in comparison
- SPEICE_ABBREVIATION: Abbreviation of species name,Query in the species_abbreviation.tsv
python main.py