/dhx-alu

Scripts related to DHX-Alu project

Primary LanguageHTML

DHX9-Alu

Scripts and pipelines corresponding to the article : DHX9 suppresses RNA processing defects originating from the Alu invasion of human genome (Aktas et. al., Nature 2017).

This directory contains scripts related to :

  1. Repeat enrichment after mapping to a library of repeat sequences.
  2. Repeat clustering and annotation from fasta files.
  3. RNA-Seq analysis of DHX9 knockdown.

Repeat enrichment after mapping to a library of repeat sequences

We followed the strategy described in Dey. et. al., 2010 to calculate enrichment of repeat classes from DHX9 uvCLAP data, for Human (hg38), Mouse (mm10) and Fly (dm6) genome. In short :

  • Repeat library was created using sequences of cannonical repeats and repeat instances. Script

  • uvCLAP reads were mapped to the library using BWA, and counts unique to each repeat class was obtained.

  • Mapping was done again using bowtie2, and counts from BWA were normalized by library sizes obtained from the bowtie2 mapping. Script

  • Maximum Likelihood Estimates (MLE) was calculated for each repeat class, in each sample. Scripts

  • All samples were clustered using MLE estimates and plotted. Scripts

Repeat clustering and annotation from fasta files.

We followed a graph-based clustering strategy originaly described in Novák et al., 2010, and implemented as a pipeline in Novák et al., 2013. In short :

  • We sampled 100,000 fastq reads from the uvCLAP experiments and converted to fasta.

  • Ran the repeatexplorer pipeline obtained from bitbucket, using seqclust_cmd.py.

  • Alu/B1 SINE IDs were extracted from repeatmasker annotated clusters Script and SeqGrapheR was used to visualize the clusters.

RNA-Seq analysis of DHX9 knockdown

The directory 04_RNASeq contains scripts corresponding to :

  1. Differential expression and splicing analysis : here.
  2. CircRNA detection : here.
  3. Identification of potential RNA-editing events : here.