This repo provides scripts and details of the data required to perform a Mendelian randomization (MR) analysis of different measures of adiposity and metabolites. We use different measures of adiposity as exposures and metabolites as outcomes.
MR is an analysis tool in which genetic variants, known as single nucleotide polymorphisms (SNPs), act as instruments to proxy for exposures of interest. Using SNPs to proxy for exposures, because of the random allocation of alleles at conception, distributes confounders evenly across the population leading to ‘un-biased’ causal estimates of the effect of (X) (expsoure) on (Y) (outcome). MR is discussed at length elsewhere: Davey Smith & Ebrahim (2003), Davey Smith & Hemani (2014), Pierce & Burgess (2013).
For this work, unless otherwise stated, we used
MR-Base to perform our analysis - specifically
we use the associated TwoSampleMR
R
pacakge. Full details of the
MR-Base platform can be found in the publictaion by Hemani et
al. (2018) but in brief:
MR-Base is a curated database of genome wide association study results
and associated applications that enable one to perform two-sample MR -
two-sample MR is where the SNPs for your exposure and outcome used in
the MR analysis come from two seperate samples.
The different measures of adiposity used for this analysis are:
BMI
-
Yengo_941
- SNPs are from Yengo et al. (2018) and can be downloaded from the GIANT website. Data used for this analysis can be downloaded directly from this link.
- UK Biobank & GIANT consortium
- European
- n = 515509 - 795624
- 941 SNPs at 5e-8
-
Yengo_646
- SNPs are from Yengo et al. (2018) and can be downloaded from the GIANT website. Data used for this analysis can be downloaded directly from this link.
- UK Biobank & GIANT consortium
- European
- n = 515509 - 795624
- 646 SNPs at 1e-8
-
Locke_77
- SNPs are from Locke et al. (2015) and can be downloaded from the GIANT website. Data used for this analysis can be downloaded directly from this link.
- GIANT consortium
- European
- n =
- 77 SNPs at 5e-8
WHR
-
Pulit_316
- SNPs are from Pulit et al. (2019) and can be downloaded from the study GitHub. Data used for this analysis can be downloaded directly from the GitHub repository, navigating to ‘SuppTable1’ and clicking ‘whr.giant-ukbb.meta.1.merged.indexSnps.combined.parsed.txt’, or by opening this link.
- UK Biobank & GIANT consortium
- European
- n = 485486 - 697702
- 316 SNPs at 5e-8
-
Shungin_....
- SNPs are from Shungin et al. (2015) and can be downloaded from the GIANT website. Data used for this analysis can be downloaded directly from this link.
- GIANT consortium
- European
- n =
- SNPs at 5e-8
WHRadjBMI
-
Pulit_346
- SNPs are from Pulit et al. (2019) and can be downloaded from the study GitHub. Data used for this analysis can be downloaded directly from the GitHub repository, navigating to ‘SuppTable1’ and clicking ‘whradjbmi.giant-ukbb.meta.1.merged.indexSnps.combined.parsed.txt’, or by opening this link.
- UK Biobank & GIANT consortium
- European
- n =
- 346 SNPs at 5e-8
-
Shungin_....
- SNPs are from Shungin et al. (2015) and can be downloaded from the GIANT website. Data used for this analysis can be downloaded directly from this link.
- GIANT consortium
- European
- n =
- SNPs at 5e-8
BF
-
Lu_7
- SNPs are from Lu et al. (2016) and can be downloaded from the GWAS Catalog. Data used for this analysis can be found in the Supplementary Material of the published article in Supplementary Table 6.
- GIANT consortium
- European
- n = 60210 - 89287
- 7 SNPs
-
Hubel_76
- SNPs are from Hübel et al. (2019) and can be downloaded from the study website. Data used for this analysis can be downloaded directly from Supplementary Table 5a of the published paper.
- UK Biobank
- European
- n = 155,961
- 76 SNPs
These data are required to run the analysis described here in. All other data, namely the outcome data, is available through the MR-Base platform.
The different metabolite data used for this analysis are:
Kettunen et al. (2016) metabolite data:
- Consortium of 14 studies
- European
- n = 24,925
- 123 metabolites profiled using NMR
- The metabolite set covers multiple metabolic pathways, including lipoprotein lipids and subclasses, fatty acids as well as amino acids and glycolysis precursors.
- 74 independent loci
- 62 independent loci identfiied in meta-analysis
- 9 additional independent secondray associations within the 62
- 2 additional independent tertiary associations within the 9
- 1 additional independent quarternary association within the 2
- 62 independent loci identfiied in meta-analysis
- MR-Base ID = 838:960
Shin et al. (2014) metabolite data:
- Two studies: KORA and TwinsUK
- European
- n = 7,824
- 452 metabolites
- 177 unknown
- 299 SNP-metabolite associations
- 145 independent SNPs
- MR-Base ID = 303:754
levelName
1 002_adiposity_metabolites
2 ¦--mrbase.oauth
3 ¦--analysis
4 ¦ ¦--BMI
5 ¦ ¦ °--plots
6 ¦ ¦--BF
7 ¦ ¦ °--plots
8 ¦ °--WHR
9 ¦ °--plots
10 ¦--data
11 ¦--environment
12 ¦ °--environment.sh
13 ¦--output
14 °--scripts
15 ¦--log
16 ¦--.Renviron
17 ¦--step1_BF_metabolites_MR.R
18 ¦--step1_BF_metabolites_qsub.sh
19 ¦--step1_BMI_metabolites_MR.R
20 ¦--step1_BMI_metabolites_qsub.sh
21 ¦--step1_WHR_metabolites_MR.R
22 °--step1_WHR_metabolites_qsub.sh
levelName
1 002_adiposity_metabolites
2 ¦--mrbase.oauth
3 ¦--analysis
4 ¦ ¦--BMI
5 ¦ ¦ °--plots
6 ¦ ¦--BF
7 ¦ ¦ °--plots
8 ¦ °--WHR
9 ¦ °--plots
10 ¦--data
11 ¦--environment
12 ¦ °--environment.sh
13 ¦--output
14 °--scripts
15 ¦--log
16 ¦--.Renviron
17 ¦--step1_BF_metabolites_MR.R
18 ¦--step1_BF_metabolites_qsub.sh
19 ¦--step1_BMI_metabolites_MR.R
20 ¦--step1_BMI_metabolites_qsub.sh
21 ¦--step1_WHR_metabolites_MR.R
22 °--step1_WHR_metabolites_qsub.sh
analysis
- where outputs from scripts are storeddata
- where data is stored from which scripts callenvironment
- where myenvironment.sh
script with the global path is storedoutput
- where the final publishable results for this project is storedscripts
- where all scripts for this project are housed- I run all of my scripts from this directory using the following
command in Terminal:
qsub my_script.sh -d ./
- I run all of my scripts from this directory using the following
command in Terminal:
log
- where the logerror
andoutput
files are stored from my submitted jobs
- I run all of my
R
scripts using the University of Bristol high performance computer BlueCrystal 3, submitting them as jobs using.sh
files - First I create an
environment.sh
file (in a directory called environment) with my global file path in.- I call this environment.sh file from my
.sh
submission scripts to set my working directory and all subsequent directories.
- I call this environment.sh file from my
- Second I create a .Renviron file (where all of my
R
scripts are stored) with the same global file path as the environment.sh file.- I call the .Renviron file from my .R scripts to set my working
directory and all subsequent directories using
Sys.getenv()
.
- I call the .Renviron file from my .R scripts to set my working
directory and all subsequent directories using
- All of my scripts for analysis are in
scripts
- All
R
scripts save outputs toanalysis
and the analysis specific directory within this- E.g.
step1_BMI_metabolites_MR.R
will output to../analysis/BMI/
, and any plots generated will output to../analysis/BMI/plots/
- E.g.
- All
.sh
submission files produceerror
andoutput
files which output to../scripts/log/
- First, and within the
scripts
directory, I submitstep1_???_metabolites_qsub.sh
as a job to BlueCrystal 3-
This job calls the
R
script of the same name i.e.step1_???_metabolites_MR.R
, which performs the analysis of the adiposity measure and the two metabolite data sets -
step1_???_metabolites_MR.R
script explanation- Load relevant libraries and set environemnt
- Identify outcomes of interest in teh MR-Base catalogue
- Read in exposure data using
read_exposure_data()
- Extract exposure SNPs from outcome data using
extract_outcome_data()
and the following variables:proxies
- look for LD proxies = yesrsq
- if proxies = yes, minimum LD R^2 value = 0.8align_alleles
- if proxies = yes, try to align proxies to target alleles = yesplaindromes
- if proxies = yes, allow palindromic SNPs = yesmaf_threshold
- MAF threshold to try to infer palindromic SNPs = 0.3
- Harmonise alleles between exposure and outcome using
harmonise_data()
and the following variables:action
= 2 - Try to infer positive strand alleles, using allele frequencies for palindromes
- Perform MR analysis using
mr()
method_list
sets the methods to use for the MR analysis
- Additional tests:
mr_singlesnp()
- obtain MR results for each single SNP of the exposure and each outcome, the default method is Wald ratio. The function also calculates the full MR result, the default method is IVW and MR Eggermr_heterogeneity()
- obtain MR heterogeneity statisticsmr_pleiotropy_test()
- the MR Egger intercept for each test can be obtained to assess horizontal pleiotropy
- Save results
-
- Second, I perform multiple testing for each of the tests performed
within eah analysis
- 0.05/123 for the Kettunen metabolite data outcome
- 0.05/275 for the Shin metabolite data outcome
- All figures show the raw beta estimates for the MR analysis
- Order of metabolites
- Ordering is first done alphabetically by class
- Ordering is secondly done within each class by lowest to highest beta of the IVW estimate
- Axis
- The solid pink line always represents 0
- The dotted pink line always represents -0.10 and 0.10.
- The lower and upper limit of each track is specific to the highest and lowest estimate for that track, and so varies across tracks
- Tracks
- The outer track is always BMI
- The middle track always body fat %
- The inner track always waist hip ratio
- Points:
- Black - IVW
- Green - MR Egger
- Purple - Wesighted median
- Blue - Weighted mode
- Open - P > 0.05/n tests
- Solid - P < 0.05/n tests