/brain_cDNA_discovery

Primary LanguageJupyter NotebookMIT LicenseMIT

Mapping medically relevant RNA isoform diversity in the aged human frontal cortex with deep long-read RNA-seq

This repository contains all code and documentation used for the analysis contained in the article above.

Repository structure:

article_analysis - Scripts used for data analysis and figure generation for article publication. Uses data from output from illumina_pipeline and nanopore_pipeline. Some figures were created using the scripts in website

illumina_pipeline - In house NextFlow pipeline optimezed for analysis of Illumina paired-end short-read sequencing data.

nanopore_pipeline - In house NextFlow pipeline optimized for analysis of Oxford Nanopore PCR Amplified cDNA sequencing data.

proteomics - Analysis pipeline to validate new transcripts at the protein level using publicly available Mass Spec data. Also explains downstream analysis steps and contains custom script used for downstream analysis and figure generation.

singularity_containers - Directory with container definition files and pull commands. With the exception of the Fragpipe pipeline (proteomics_pipeline) and the Rshiny web app (website), all the software used in this GitHub repository is in these singularity containers.

website - Contains Rshiny app scripts that allows users to perform gene queries and visualize RNA isoform expression from the data used in this publication. Access website here

Data availability

Raw nanopore sequencing fastq files generated in this study are available here. Also available through NIH SRA (Accession number: SRP456327)

Proteomics (Mass spec) data from cell-lines used in this experiment are publicly available here. For more information about this data see: https://pubmed.ncbi.nlm.nih.gov/36959352/

Proteomics (Mass spec) data from round 2 of the ROSMAP TMT brain Proteomics are puclicly available here. For more information about this data see: https://www.nature.com/articles/s41597-020-00650-8

Final output files from transcriptomics/RNAseq and proteomics analysis and annotations/references used in this study are available here

GTEx long-read RNAseq data used for validation of our study results is available here

ROSMAP short-read RNAseq data used for validation of our study results is available here

More information

Each directory within this GitHub repository contains documentation for the analysis performed in that directory. If you have any questions please submit and issue.