UKBiobankGWAS

Notes and code for running UK Biobank GWAS at the MRC IEU

Steps

Request GWAS account from IEU data manager if not already done
Create input files in your RDSF input directory
Wait for files to copy over to BC4
On BC4, clone this repo and get .env
Run the GWAS submission script
A job is submitted to the queue that QCs the files, and then creates a new submission job for the GWAS
Wait for GWAS to complete and output files to sync back to RDSF

Details

Create input files on RDSF

Create jobs.csv in input directory, containing information on GWAS jobs

all column names must be present
if no value, provide empty entry e.g. ,,
for multiple covariates, separate using ;

name,application_id,pheno_file,pheno_col,covar_file,covar_col,qcovar_col,method
test,123,test.txt,test_name,bolt_covariates.txt,sex;chip,age,bolt
test2,123,test.txt,test_name,bolt_covariates.txt,sex;chip,age,bolt

Each gwas job is first checked to make sure both phenotype and covariate files exist in correct format and contain specified columns.
If all good, submission script is created and run as a new slurm job

Create phenotype and covariate files, and place them in RDSF input directory as before.

see https://github.com/MRCIEU/BiobankPhenotypes/wiki#phenotype-files for details

Setup and run job submission code on BC4

Set up GitHub SSH keys
Clone repo to any directory on BC4 git clone git@github.com:MRCIEU/UKBiobankGWAS.git
Move into the directory cd UKBiobankGWAS
Copy the cp /mnt/storage/private/mrcieu/research/UKBIOBANK_GWAS_Pipeline/scripts/.env ./ file in this repository

Single job

Run from within the repository

sbatch scripts/ukb_gwas.sh

by default this will run the first row in jobs.csv
can specify rows using 0 based indexing, so row 3 is 2, e.g. sbatch scripts/ukb_gwas.sh 2

Multiple jobs

Run from within this repository

for i in {0..1}; do echo $i; sbatch scripts/ukb_gwas.sh $i; done

Summary

Can generate summary files and parse to create counts:

sbatch UKBiobankGWAS/scripts/summary.sh
python UKBiobankGWAS/scripts/summary_parser.py

To do

add args to allow only qc step
add plink

JujiaoKang/UKBiobankGWAS