xQTL analysis pipeline

Developed for reproducible QTL analysis for the NIH/NIA Alzheimer's Disease Sequencing Project Functional Genomics Consortium.

How to use this resource

Standardized reference data

Reference data are standardized and curated by the ADSP FGC Standardization Workgroup in coordination with NIAGCADS. Please find reference data specifications on ADSP Dashboard.

Pipeline execution

Pipelines in this repository are written in the Script of Scripts (SoS) workflow language. Like most other workflow languages, SoS workflows can distribute and execute computing jobs directly in High Performance Computing cluster. It can also use containers (Docker or Singularity) to help with setting up computational environment and improve reproducibility. Unlike most other workflow languages, SoS workflows are created using SoS Notebooks (based on Ipython Notebook and developed in Jupyter) which allow for both scientific narrative and pipeline scripts in the same document. Unlike typical Jupyter Notebooks intended for interactive data analysis, SoS workflows written in Jupyter Notebooks can be executed directly as command line scripts either on a local computer or in a HPC environment.

We provide this toy example for running SoS pipeline on a typical HPC cluster environment. First time users are encouraged to try it out in order to help setting up the computational environment necessary to run the QTL analysis.

Source code

Source code of pipelines and containers implemented in this repository are available at https://github.com/cumc/xqtl-pipeline/tree/main/code.
Container configurations are available at https://github.com/cumc/xqtl-pipeline/tree/main/container.

Software container and minimal working example data

Minimal working examples are available through request to access this Google Drive folder.
Under the folder above you can find singularity image release for software environment. You can also build the singularity image from configuration files at: https://github.com/cumc/xqtl-pipeline/tree/main/container/singularity.

Organization of the resource

The website https://cumc.github.io/xqtl-pipeline is generated from files under code folder of the source code repository. The pipeline folder are symbolic links automatically generated for pipeline files under code. The logic of the entire xQTL analysis workflow is roughly reflected on the left side bar:

The COMMAND GENERATOR section is reserved for "push botton" commands that generates the entire QTL analysis pipeline workflow script from a simple configuration file. Notebooks under this sections are meant to be executed as command line software to generate data analysis commands. The generated commands can be executed as is to complete all available analysis, or can be used to help customizing specific analysis tasks by making modification on them. The configuration file itself helps centralized control and book keeping of workflows executed.
Other sections in bold contain various types of analysis available, roughly showing in order from upstream to downstream analysis. We will refer to them as analysis groups, which are further divided into protocols by various non-bold, clickable text under each analysis group linking to some notebooks. These notebooks illustrate commands to perform analysis implemented in the protocol. Most of them are "tutorials" in nature and are meant to be executed interactively in Jupyter or in command terminal to run the SoS pipelines line by line. A few are the actual pipeline modules implementing pipelines in SoS, as will be discussed next.
Protocols can be expanded by clicking on the down arrows to access the SoS workflows implemention of pipeline modules. These are the core pipeline implementations to be executed as command line software, and are meant to be self-contained --- they may be used in other contexts not specific to the xQTL data analysis. Each of these pipeline modules are documented with some background information, required input, expected output, and most importantly a minimal working example to allow users to test it out with a toy data-set before applying to their own analysis. The rest of the pipeline module are the actual code implementations.

xQTL workflow schema

Contributors

This repository is developed by the ADSP FG Brain xQTL consortium.

Developers

Lead developers

Hao Sun, Department of Neurology, Columbia University
Gao Wang, Department of Neurology, Columbia University

Contributors

Wenhao Gou, Department of Biostatistics, Columbia University
Liucheng Shi, Department of Biostatistics, Columbia University
Xuanhe Chen, Department of Biostatistics, Columbia University
Amanda Tsai, Department of Biostatistics, Columbia University

Brain xQTL project leadership

Philip De Jager, Department of Neurology, Columbia University
Carlos Crunchaga, Department of Psychiatry, Neurology and Genetics, Washington University in St. Louis

Brain xQTL methods and data integration work group

Gao Wang (work group leader), Department of Neurology, Columbia University
Xiaoling Zhang, Departments of Medicine and Biostatistics, Boston University
Edoardo Marcora, Departments of Neuroscience, Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai

changebio/xqtl-pipeline

xQTL analysis pipeline

How to use this resource

Standardized reference data

Pipeline execution

Source code

Software container and minimal working example data

Organization of the resource

xQTL workflow schema

Contributors

Developers