This repository contains analysis code for the publication found here, which aimed at improving the annotation of disease-relevant genes using RNA-sequencing data. The accompanying web-based tool vizER can be used to visualise individual genes of interest for evidence of incomplete annotation.
If you use any code part of this repository please cite the Science Advances publication: DOI 10.1126/sciadv.aay8299.
Directory | Description |
---|---|
analyse_ER_annotation | ER related analyisis including number of quantifying OMIM gene re-annotation and total ER Mb across annotation features. Validation of ERs across Ensembl versions and within an independent dataset |
annotate_ERs | Annotating ERs with metrics such as association to genes through junctions, annotation features, conservation and constraint |
check_protein_coding_potential | Checking protein potential of ERs |
complex_disorders | Re-annotation of GWAS hits from STOPGAP |
download_tidy_OMIM_data | Download details of Mendelian disease genes via OMIM API |
export_ER_details | Formatting ER details for publication |
generate_ERs_varying_cut_offs_maxgaps_GTEx_tissues | Using derfinder to define tissue-specific expressed regions (ERs) for each GTEx tissue* |
generate_randomised_intron_inter_regions | Generating tissue-specific randomised length-matched regions |
GTEx_split_read_reformatting | Re-format the raw GTEx junction data dowloaded from recount2 for input into annotatER |
optimising_derfinder_cutoff | Optimising the definitions of ERs using a gold-standard set of non-overlapping exons* |
*These elements of the pipeline have been wrapped into an R package that can be found here.