/surf-paper

This repository makes available the source code of our SURF paper(s).

Primary LanguageRGNU General Public License v3.0GPL-3.0

Code for SURF Analysis of ENCODE Data

DOI

This repository makes available the source code for our SURF paper(s). Last updated in April 2020.

The paper presents the Statistical Utility for RBP Functions (SURF) for integrative analysis of RNA-seq and CLIP-seq data. The goal of SURF is to identify alternative splicing (AS), alternative transcription initiation (ATI), and alternative polyadenylation (APA) events regulated by individual RBPs and elucidate protein-RNA interactions governing these events. We apply the SURF pipeline to analyze 104 RBP data sets (from ENCODE). Check out the browsable results from this shiny app!

The current repository includes:

  • application/
    • xena.R: process TCGA and GTEx transcriptome data.
    • encode_surf_one.R: perform SURF analysis for one RBP. This is used for all 104 RBPs.
    • encode_surf_summary.R: summarize the SURF results, including all the statistics and plots reported in the paper.
  • simulation/
    • other_simulation.sh: prepare DEXSeq and run rMATS and MAJIQ.
    • drseq_simulation.R: run DrSeq and DEXSeq, analyze simulation results, including all the statistics and plots reported in the paper.
    • majiq/: contain two files needed for running MAJIQ.
    • dexseq/: contain two files needed for DEXSeq preparation.

To reproduce the ENCODE data analysis/results (available at (DOI): 10.5281/zenodo.3779037):

  1. Download the processed bam files (shRNA-seq and eCLIP-seq) from ENCODE portal.
  2. Download transcriptome quantification of TCGA and GTEx projects from Xena.
  3. Run xena.R, encode_surf_one.R (for each RBP), and encode_surf_summary.R in order.

To reproduce the simulation results:

  1. Download the processed bam files (Homo sapiens) from ArrayExpress dataset E-MTAB-3766.
  2. Run other_simulation.sh and drseq_simulation.R in order.

Contact

Fan Chen (fan.chen@wisc.edu) or Sunduz Keles (keles@stat.wisc.edu)

Reference

Chen F and Keles S. “SURF: integrative analysis of a compendium of RNA-seq and CLIP-seq datasets highlights complex governing of alternative transcriptional regulation by RNA-binding proteins.”