/mirna_mite

A pipeline to search and analyze miRNAs and Transposable elements in Wheat. Supplementary codes for an upcoming publication

Primary LanguageJupyter Notebook

This repo is divided into steps:

  • 1_data_ss

    • 1_mrcv_ss.sh Will run shortstack in "Mal de Rio Cuarto" smallRNA data https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5423983/
    • 2_sun_get_data.sh Will download smallRNA data from Sun et al. https://bmcplantbiol.biomedcentral.com/articles/10.1186/1471-2229-14-142
    • 3_sun_filter.sh Will filter sun adapters
    • 4_sun_ss.sh and 5_sun_ss_separated.sh will run shortstack with Sun et al. data. Only results from 4 will be used in further analysis.
    • 6_transcript_name_fixer.py will remove descriptions from wheat cDNA because of a Cleaveland requirement
    • 7_sun_deg_filter.sh Will filter Sun et al. degradome data
    • 8_sun_cleave.sh Run cleaveland with Sun et al. data
  • 1_mites

    • create_unique_consensous.ipynb Takes a fasta file with MITEs sequences and creates one consensous file using vsearch
  • 2_analysis

    • 1_analysis.ipynb Will do the main analysis and counting in miRNA production sites using Shortstack outputs, MITEs blast and genomic annotation

    • 2_cleavage.ipynb Uses Cleaveland and psRNATargetFinder outputs to analyse cleavage of targets data

  • Other scripts used in this pipeline:

BLAST MITEs against wheat genome

blastn -task blastn -query <mites_files.fasta> \
-subject Triticum_aestivum.IWGSC.dna.toplevel.fa \
-outfmt  "6 qseqid sseqid qstart qend \ 
sstart send mismatch gaps pident evalue length \ 
qlen slen qcovs" -evalue 1e-10 > <mites_results.csv>

Trim smallRNA libraries (sun et al.)

trim_galore <lib.fastq.gz> -o <lib.trimmed.fastq.gz> \
--adapter TCGTATGCCGTCTTCTGCTTG --max_length 30 --length 18

Shortstack smallRNA library / wheat genome

ShortStack --readfile <lib.trimmed.fastq.gz> ... \
--genomefile Triticum_aestivum.IWGSC.dna.toplevel.fa \
--sort_mem 9G --foldsize 1000 --sort_mem 9G

DEGs

Get DEGs C vs T Get miRNA DAT C vs T