/RNApipe

Primary LanguagePython

RNApipe

1. Installation

1.1 Install Miniconda

As this workflow is based on the workflow management system snakemake and conda. We strongly recommend installing miniconda3 with python3. For specific installation methods and usage methods of RNApipe, please refer to the RNApipe_Documentation.pdf.

1.2 Install RNApipe

Clone the repository:

git clone https://github.com/ywu019/RNApipe.git

Create the environment:

conda env create -n RNApipe -f envs/envs.yaml

Activate the environment:

conda activate RNApipe

2. Prepare input files

Several input files are required in order to run the workflow, a genome file (.fa), an annotation file (.gff/.gtf) and compressed sequencing files (.fastq.gz).

File type Description
genome.fa user-provided genome file containing the genome sequence
annotation.gtf user-provided annotation file with genomic features
.fastq.gz user-provided compressed sequencing files
config.yaml configuration file to customize the workflow
samples.tsv sample file describing the relation between the input fastq files

2.1 Annotation.gff and genome.fa

We recommend retrieving both the genome and the annotation files for your organism from National Center for Biotechnology Information (NCBI) Note: if you use custom annotation files, ensure that you adhere to the gtf standard

2.2 Input .fastq files

These are the input files provided by the user. Both single end and paired end data is supported. Note: Please ensure that you compress your files in .gz format and .fastq.gz (root_1_R1.fastq.gz if paired end data /root_1.fastq.gz if single end data )

2.3 Set up configuration

Modify the metafile describing your data configs/samples.tsv .

condition replicate
root 1
root 2
root 3
shoot 1
shoot 2
shoot 3

Customize the workflow based on your need in configs/config.yaml .It contains the following variables:

  • PROJECT: Project name
  • READSPATH: The path to fastq files
  • SAMPLES: configs/samples.tsv
  • END: sequencing paired-end or single-end
  • OUTPUTPATH: Thw path for final outputs
  • GENOME: The path of genome files
  • ANNOTATION: The path of annotation files
  • CONTROL: Control group in comparison
  • TREAT: Processing groups in comparison
  • SPEICE_ABBREVIATION: Abbreviation of species name,Query in the species_abbreviation.tsv

3. Run RNApipe

python main.py