This repository makes available the source code for our SURF paper(s). Last updated in April 2020.
The paper presents the Statistical Utility for RBP Functions (SURF) for integrative analysis of RNA-seq and CLIP-seq data. The goal of SURF is to identify alternative splicing (AS), alternative transcription initiation (ATI), and alternative polyadenylation (APA) events regulated by individual RBPs and elucidate protein-RNA interactions governing these events. We apply the SURF pipeline to analyze 104 RBP data sets (from ENCODE). Check out the browsable results from this shiny app!
The current repository includes:
- application/
xena.R
: process TCGA and GTEx transcriptome data.encode_surf_one.R
: perform SURF analysis for one RBP. This is used for all 104 RBPs.encode_surf_summary.R
: summarize the SURF results, including all the statistics and plots reported in the paper.
- simulation/
other_simulation.sh
: prepare DEXSeq and run rMATS and MAJIQ.drseq_simulation.R
: run DrSeq and DEXSeq, analyze simulation results, including all the statistics and plots reported in the paper.- majiq/: contain two files needed for running MAJIQ.
- dexseq/: contain two files needed for DEXSeq preparation.
To reproduce the ENCODE data analysis/results (available at (DOI): 10.5281/zenodo.3779037):
- Download the processed bam files (shRNA-seq and eCLIP-seq) from ENCODE portal.
- Download transcriptome quantification of TCGA and GTEx projects from Xena.
- Run
xena.R
,encode_surf_one.R
(for each RBP), andencode_surf_summary.R
in order.
To reproduce the simulation results:
- Download the processed bam files (Homo sapiens) from ArrayExpress dataset E-MTAB-3766.
- Run
other_simulation.sh
anddrseq_simulation.R
in order.
Fan Chen (fan.chen@wisc.edu) or Sunduz Keles (keles@stat.wisc.edu)
Chen F and Keles S. “SURF: integrative analysis of a compendium of RNA-seq and CLIP-seq datasets highlights complex governing of alternative transcriptional regulation by RNA-binding proteins.”