Mendelian randomization analysis of different measures of adiposity and metabolites

This repo provides scripts and details of the data required to perform a Mendelian randomization (MR) analysis of different measures of adiposity and metabolites. We use different measures of adiposity as exposures and metabolites as outcomes.


Brief introduction to MR

MR is an analysis tool in which genetic variants, known as single nucleotide polymorphisms (SNPs), act as instruments to proxy for exposures of interest. Using SNPs to proxy for exposures, because of the random allocation of alleles at conception, distributes confounders evenly across the population leading to ‘un-biased’ causal estimates of the effect of (X) (expsoure) on (Y) (outcome). MR is discussed at length elsewhere: Davey Smith & Ebrahim (2003), Davey Smith & Hemani (2014), Pierce & Burgess (2013).


Analysis platform

For this work, unless otherwise stated, we used MR-Base to perform our analysis - specifically we use the associated TwoSampleMR R pacakge. Full details of the MR-Base platform can be found in the publictaion by Hemani et al. (2018) but in brief: MR-Base is a curated database of genome wide association study results and associated applications that enable one to perform two-sample MR - two-sample MR is where the SNPs for your exposure and outcome used in the MR analysis come from two seperate samples.



Exposure data

The different measures of adiposity used for this analysis are:


  • Yengo_941

    • SNPs are from Yengo et al. (2018) and can be downloaded from the GIANT website. Data used for this analysis can be downloaded directly from this link.
    • UK Biobank & GIANT consortium
    • European
    • n = 515509 - 795624
    • 941 SNPs at 5e-8
  • Yengo_646

    • SNPs are from Yengo et al. (2018) and can be downloaded from the GIANT website. Data used for this analysis can be downloaded directly from this link.
    • UK Biobank & GIANT consortium
    • European
    • n = 515509 - 795624
    • 646 SNPs at 1e-8
  • Locke_77

    • SNPs are from Locke et al. (2015) and can be downloaded from the GIANT website. Data used for this analysis can be downloaded directly from this link.
    • GIANT consortium
    • European
    n =
    • 77 SNPs at 5e-8





These data are required to run the analysis described here in. All other data, namely the outcome data, is available through the MR-Base platform.


Outcome data

The different metabolite data used for this analysis are:


Kettunen et al. (2016) metabolite data:

  • Consortium of 14 studies
  • European
  • n = 24,925
  • 123 metabolites profiled using NMR
    • The metabolite set covers multiple metabolic pathways, including lipoprotein lipids and subclasses, fatty acids as well as amino acids and glycolysis precursors.
  • 74 independent loci
    • 62 independent loci identfiied in meta-analysis
      • 9 additional independent secondray associations within the 62
      • 2 additional independent tertiary associations within the 9
      • 1 additional independent quarternary association within the 2
  • MR-Base ID = 838:960


Shin et al. (2014) metabolite data:

  • Two studies: KORA and TwinsUK
  • European
  • n = 7,824
  • 452 metabolites
    • 177 unknown
  • 299 SNP-metabolite associations
    • 145 independent SNPs
  • MR-Base ID = 303:754


File structure

1  002_adiposity_metabolites            
2   ¦--mrbase.oauth                     
3   ¦--analysis                         
4   ¦   ¦--BMI                          
5   ¦   ¦   °--plots                    
6   ¦   ¦--BF                           
7   ¦   ¦   °--plots                    
8   ¦   °--WHR                          
9   ¦       °--plots                    
10  ¦--data                             
11  ¦--environment                      
12  ¦   °               
13  ¦--output                           
14  °--scripts                          
15      ¦--log                          
16      ¦--.Renviron                    
17      ¦--step1_BF_metabolites_MR.R    
18      ¦ 
19      ¦--step1_BMI_metabolites_MR.R   
20      ¦
21      ¦--step1_WHR_metabolites_MR.R   
22      °
  • analysis - where outputs from scripts are stored
  • data - where data is stored from which scripts call
  • environment - where my script with the global path is stored
  • output - where the final publishable results for this project is stored
  • scripts - where all scripts for this project are housed
    • I run all of my scripts from this directory using the following command in Terminal: qsub -d ./
  • log - where the log error and output files are stored from my submitted jobs



  1. I run all of my R scripts using the University of Bristol high performance computer BlueCrystal 3, submitting them as jobs using .sh files
  2. First I create an file (in a directory called environment) with my global file path in.
    • I call this file from my .sh submission scripts to set my working directory and all subsequent directories.
  3. Second I create a .Renviron file (where all of my R scripts are stored) with the same global file path as the file.
    • I call the .Renviron file from my .R scripts to set my working directory and all subsequent directories using Sys.getenv().



  • All of my scripts for analysis are in scripts
  • All R scripts save outputs to analysis and the analysis specific directory within this
    • E.g. step1_BMI_metabolites_MR.R will output to ../analysis/BMI/, and any plots generated will output to ../analysis/BMI/plots/
  • All .sh submission files produce error and output files which output to ../scripts/log/
  1. First, and within the scripts directory, I submit step1_??? as a job to BlueCrystal 3
    • This job calls the R script of the same name i.e. step1_???_metabolites_MR.R, which performs the analysis of the adiposity measure and the two metabolite data sets

    • step1_???_metabolites_MR.R script explanation

      1. Load relevant libraries and set environemnt
      2. Identify outcomes of interest in teh MR-Base catalogue
      3. Read in exposure data using read_exposure_data()
      4. Extract exposure SNPs from outcome data using extract_outcome_data() and the following variables:
        • proxies - look for LD proxies = yes
        • rsq - if proxies = yes, minimum LD R^2 value = 0.8
        • align_alleles - if proxies = yes, try to align proxies to target alleles = yes
        • plaindromes - if proxies = yes, allow palindromic SNPs = yes
        • maf_threshold - MAF threshold to try to infer palindromic SNPs = 0.3
      5. Harmonise alleles between exposure and outcome using harmonise_data() and the following variables:
        • action = 2 - Try to infer positive strand alleles, using allele frequencies for palindromes
      6. Perform MR analysis using mr()
        • method_list sets the methods to use for the MR analysis
      7. Additional tests:
        • mr_singlesnp() - obtain MR results for each single SNP of the exposure and each outcome, the default method is Wald ratio. The function also calculates the full MR result, the default method is IVW and MR Egger
        • mr_heterogeneity() - obtain MR heterogeneity statistics
        • mr_pleiotropy_test() - the MR Egger intercept for each test can be obtained to assess horizontal pleiotropy
      8. Save results
  2. Second, I perform multiple testing for each of the tests performed within eah analysis
    • 0.05/123 for the Kettunen metabolite data outcome
    • 0.05/275 for the Shin metabolite data outcome



  • All figures show the raw beta estimates for the MR analysis
  • Order of metabolites
    • Ordering is first done alphabetically by class
    • Ordering is secondly done within each class by lowest to highest beta of the IVW estimate
  • Axis
    • The solid pink line always represents 0
    • The dotted pink line always represents -0.10 and 0.10.
    • The lower and upper limit of each track is specific to the highest and lowest estimate for that track, and so varies across tracks
  • Tracks
    • The outer track is always BMI
    • The middle track always body fat %
    • The inner track always waist hip ratio
  • Points:
    • Black - IVW
    • Green - MR Egger
    • Purple - Wesighted median
    • Blue - Weighted mode
    • Open - P > 0.05/n tests
    • Solid - P < 0.05/n tests