/MAGE

Analysis of gene expression and splicing diversity in a subset of samples from the 1000 Genomes Project, including eQTL and sQTL discovery and annotation.

Primary LanguageR

MAGE logo

MAGE: Multi-ancestry Analysis of Gene Expression

DOI

MAGE comprises RNA-seq data from lymphoblastoid cell lines derived from 731 individuals from the 1000 Genomes Project (1KGP), representing 26 globally-distributed populations across five continental groups. These data offer a large, geographically diverse, open access resource to facilitate studies of the distribution, genetic underpinnings, and evolution of variation in human transcriptomes and include data from several ancestry groups that were poorly represented in previous studies.

Data Access

Raw reads

Newly generated RNA sequencing data for the 731 individuals (779 total libraries) is available on the Sequence Read Archive (Accession: PRJNA851328).

Processed data

Processed gene expression matrices and QTL mapping results (as well as a host of other downstream data) are currently available on Zenodo (MAGEv1.0 Zenodo link) as well as Dropbox (MAGEv1.0 Dropbox link).

Briefly, this repo contains the following data:

  1. Sample metadata and sequencing metrics
  2. Gene expression and splicing matrices used for e/sQTL mapping and analyses of global trends of expression/splicing diversity
  3. cis-e/sQTL mapping results, including aFC estimates for cis-eQTLs
  4. Functional annotations of cis-e/sQTLs
  5. Results of colocalization analysis between MAGE e/sQTLs and complex trait GWAS from the PAGE study
  6. Results of analyses of global trends of expression/splicing diversity
  7. Jointly-generated top genotype PCs for samples in MAGE and other resources with paired WGS/RNA-seq data (Geuvadis, GTEx, AFGR)

READMEs are provided for all data in the repo.

If you are having trouble accessing these data, please feel free to contact us to explore other options (e.g., Globus).

Variant calls

The high-coverage variant calls used for QTL mapping were previously generated by the New York Genome Center (NYGC) and are available through the 1KGP FTP site.

Code

Code used for data processing and downstream analyses is made available in the analysis_pipeline/ directory, along with READMEs describing how each script is run.

Code used to produce major figures/panels in the manuscript is made available in the figure_generation/ directory.

The MAGE manuscript

For more information about the MAGE resource as well as analyses performed using this resource, please see our paper:

Sources of gene expression variation in a globally diverse human cohort
Dylan J. Taylor, Surya B. Chhetri, Michael G. Tassia, Arjun Biddanda, Stephanie M. Yan, Genevieve L. Wojcik, Alexis Battle, Rajiv C. McCoy

Citing MAGE

If you use MAGE data in your own work, please cite the paper linked above.