/GP3

GWAS Pre-Processing Pipeline

Primary LanguagePythonMIT LicenseMIT

GP3

GWAS Pre-Processing Pipeline

Overview and Purpose


An automated pipeline to pre-process GWAS data to determine samples to remove prior to input into imputation pipelines and association testing pipelines. This should be used after initial round of QC/filtering has been performed (i.e. removing SNPs and samples that fail due to poor snp/sample quality from idats).

Software Requirements


The following are the minimum software requirements:

--R libraries that need to be installed manually--
The following list of R libraries, including their dependencies must be installed and functional:

--Software Requirements that can be installed automatically--
The following list of Python libraries are required but the pipeline can automatically install them if pip is available:

User Generated File Requirements


There are two files that are minimally required in order to run the pipeline:

  • Input PLINK file either in .bed or .ped format
  • Populated sample_sheet_template.xlsx

Installation of virtual environment, chunkypipes, and pipeline


Please click here for detailed instructions on setting up a virtual environment for shared systems or for installation on systems with root privledges.

ALREADY INSTALLED CHUNKYPIPES AND PIPELINE? Click here for quick start.

Output Files


If you navigate to your output directory you should notice a new directory matching the project name you specified at the time of the run. Navigate into this directory and you should see new directories based on the ethnic group names you specified in your sample_sheet.xlsx as well as a set of PDFs. These PDFs are the ones promised above in the diagram. If navigate into one of the directories of your ethnic group you will notice several PLINK files that were generated at each step of the pipeline.

Addtionally, here are the notable final files of interest if the --TGP flag is specified:

  1. <ethnic group name>_all_samples_to_remove_from_original.txt
  2. <ethnic group name>_all_steps_completed_TGP_final followed by the following suffixes: * .bed * .bim * .fam * .kin * .kin0 * .gds
  3. <ethnic group name>_all_steps_completed_TGP_final_GENESIS.Rdata
  4. <ethnic group name>_all_steps_completed_TGP_final_phenoGENESIS.txt
  5. <ethnic group name>_TGP_PCA_plots.pdf

If no --TGP flag is specified here are the final output file names:

  1. <ethnic group name>_all_samples_to_remove_from_original.txt
  2. <ethnic group name>_all_steps_completed_final followed by the following suffixes: * .bed * .bim * .fam * .kin * .kin0 * .gds
  3. <ethnic group name>_all_steps_completed_TGP_final_GENESIS.Rdata
  4. <ethnic group name>_all_steps_completed_TGP_final_phenoGENESIS.txt
  5. <ethnic group name>_all_steps_completed_final_GENESIS_sample_key_file.txt
  6. <ethnic group name>_individual_PCA_plots.pdf

Questions?


For more information, please visit the Wiki on or contact me (tbrunetti) and I would be happy to address any issues.