RELI (Regulatory Element Locus Intersection) is an algorithm for discovering transcription factors (TFs) that bind a significant number of loci associated with a given disease or phenotype (e.g., through a Genome Wide Association study, or GWAS).
The major data components are
-
An input set of disease or phenotype-associated genetic variants (RS IDs)
-
An internal “library” consisting of many ChIP-seq dataset peaks (in the form of .bed files)
-
An internal file containing information on genetic variant allele frequencies, etc.
To assess the significance of the intersection between the input disease variants and a given TF ChIP-seq dataset, RELI performs simulations, generating a null distribution used for P-value calculations.
The output of RELI is a series of statistics based upon the significance of the overlap between the input genetic variants and the selected ChIP-seq dataset.
Additional details on RELI and the associated findings can be found in its accompanying publication.
RELI requires a C++11 compiler (e.g. GNU CC 4.7 or higher) and libgsl
and
libgslcblas
from the GNU Scientific Library.
You may download the latest release as a compressed archive from GitHub, or clone the repository with Git:
# GitHub
git clone https://github.com/WeirauchLab/RELI.git
# Weirauch Lab GitLab
git clone https://tfwebdev.research.cchmc.org/gitlab/ches2d/RELI_public.git
A GNU-style Makefile
is provided in the repository. With GSL installed
system-wide, you can build the RELI binary with just
make
then run ./RELI
with no arguments to verify that you have a working binary
(you should get a help screen).
In order to run a test analysis, you need to download the sample data either manually (see the next section) or just type
make test
which will download and validate the sample datasets automatically, then invoke
example/example_run.sh
to invoke RELI on the sample data.
This test analysis requires around 10 GB of RAM to finish successfully; 16 GB is recommended.
The included Makefile
will respect CFLAGS
and LDFLAGS
if set in the
environment, for example, if you have a locally-built GSL that is installed in
a non-standard place (such as in your home directory):
CFLAGS=-I/path/to/include LDFLAGS=-L/path/to/lib make
If g++
is not available in your PATH
(or it has a different name), you will
likely want to modify the Makefile directly, beginning around line 33 with the
CC
variable.
RELI has also been verified to build and run on the following platforms (in addition to GNU/Linux):
-
Windows with Cygwin and GCC 5.4.0 (ensure the
gcc-g++
,make
,gsl
, andlibgsl-devel
, andcurl
packages are installed, at a minimum) -
Mac OS X 10.11.6 (El Capitan) with LLVM 8.0.0 (provided by the Xcode Command Line Tools) and GSL installed from MacPorts
On Windows, make sure you run make
(or the example/example_run.sh
script)
from within the Cygwin shell, not the Windows Command Prompt or PowerShell.
You may need to lightly modify the CDT build toolchain settings if your
installation of Cygwin is not at C:\Cygwin64
.
Eclipse CDT project settings files are also included for both of the
above toolchains. Just create a copy (or symlink) of the appropriate one
called .cproject
, then choose File → Import... → Existing
Projects into Workspace and browse to where you cloned the repository.
If you have problems with make test
(perhaps you don't have curl
available), you can manually download and extract the sample datasets from
such that the decompressed data is inside a data
subdirectory, within the
RELI_public
repository you cloned above. A .zip
-format archive is also
provided, in case for some reason you don't have bzip2
available.
You can run the sample analysis by changing into the example
directory and
running example_run.sh
in a terminal like so:
user@[/path/to/repo]$ cd example
user@[/path/to/repo]$ ./example_run.sh
Required options are in bold text
Option | Explanation |
---|---|
-snp FILE |
Phenotype snp file in 4 column bed format |
-ld FILE |
(optional) Phenotype linkage disequilibrium structure for snps, default: no ld file |
-index FILE |
ChIP-seq index file |
-data DIR |
Specify directory where ChIP-seq data are stored |
-target STRING |
Target label of ChIP-seq experiment to be tested from index file |
-build FILE |
Genome build file |
-null FILE |
Null model file |
-dbsnp FILE |
dbSNP table file |
-out DIR |
Specify output directory name under currentg working folder. |
-match |
(optional) Boolean switch to turn on minor allele frequency based matching, default: off |
-rep NUMBER |
(optional) Number of permutation/simulation to be performed, default: 2000 |
-corr NUMBER |
(optional) Bonferroni correction multiplier for multiple test, default: 1 |
-phenotype STRING |
(optional) User-provided phenotype name, default: "." |
-ancestry STRING |
(optional) User provided ancestry name, default: "." |
To add an additional ChIP-seq dataset, create an entry in the ChIP-seq index
file (data/ChIPseq.index
) with the following tab-delimited format:
label ⇥ source ⇥ Cell ⇥ TF ⇥ Cell label ⇥ PMID ⇥ Group ⇥ EBV Status ⇥ Species
where label
corresponds to the filename, which you should deposit in the
data/ChIP-seq
directory (in BED 4 column format).
To use a different genome build, use the UCSC fetchChromSizes
utility
(usage information here) to download chromosome information for that
build. You may wish to prune lines representing unmapped chromosome information
(e.g., chrN_glXXXXXX_random
and chrUn_glXXXXXX
) from the downloaded data
file.
Be advised, however, that the null model included with the data was generated for Homo sapiens at build hg19; using a later "hg" build may invalidate this model.
Please contact us via email (or file an issue against the public GitHub repository) for additional details, or if you need support for a different organism.
Transcription factors operate across disease loci, with EBNA2 implicated in autoimmunity.
Harley JB, Chen X, Pujato M, Miller D, Maddox A, Forney C, Magnusen AF, Lynch A, Chetal K, Yukawa M, Barski A, Salomonis N, Kaufman KM, Kottyan LC, Weirauch MT.
Nat Genet. 2018 Apr 16. doi: 10.1038/s41588-018-0102-3. [Epub ahead of print]
PMID: 29662164
Please report any issues with RELI (or feature suggestions) in our GitHub issue tracker.
With other questions, you may contact Dr. Chen (the primary author of RELI) or Dr. Weirauch via email.
Name | Institution | Remarks |
---|---|---|
Dr. Xiaoting Chen | Cincinnati Children's Hospital | primary author |
Project avatar based on Wikimedia Commons Chromosome_18.svg