/empirical-JTK_CYCLE-with-asymmetry

This repository contains the code described in Hutchison, Maienschein-Cline, and Chiang et al. (2015) Improved statistical methods enable greater sensitivity in rhythm detection for genome-wide data. PLoS Comput Biol 11(3): e1004094. doi:10.1371/journal.pcbi.1004094

Primary LanguagePythonMIT LicenseMIT

#Empirical JTK_CYCLE

Hi!

This is the README for running the rhythm detection method described in Hutchison AL, Maienscein-Cline M, Chiang AH, Tabei SMA, Gudjonson H, Bahroos N, Allada R, Dinner AR. “Improved statistical methods enable greater sensitivity in rhythm detection for genome-wide data.” PLoS Computational Biology 2015 Mar. Vol. 11, No. 3, pp. e1004094. doi:10.1371/journal.pcbi.1004094

It is based on the original JTK_CYCLE code from Hughes ME, Hogenesch JB, Kornacker K. JTK_CYCLE: an efficient nonparametric algorithm for detecting rhythmic components in genome-scale data sets. J Biol Rhythms. 2010 Oct;25(5):372-80. doi:10.1177/0748730410379711.

A previous version of this method empirically calculated the p-values using many permutations. We have recently sped up this calculation by approximating the null tau distribution using a Gamma distribution based on 1000 permutations. P-values can be derived from this model, allowing for p-values independent of the number of permutations. The description of this code can be found in Hutchison AL et al. (2016), "BooteJTK: Improved Rhythm Detection via Bootstrapping", available at bioRxiv.

##Applications:

An application of the method to mouse and human pancreas and liver can be found in Perelis et al. "Pancreatic β cell enhancers regulate rhythmic transcription of genes controlling insulin secretion" (2015). 350(6261). aac4250. doi:10.1126/science.aac4250

An application of the method to Drosophila neurons can be found in Flourakis M et al. "A conserved bicycle model for circadian clock control of membrane excitability" Cell (2015). 162(4). 836-848. doi:10.1016/j.cell.2015.07.036

An application of the method to 16S gut microbiome data can be found in Leone VA et al. “Effects of diurnal variation of gut microbes and high fat feeding on host circadian clock function and metabolism” Cell Host-Microbe (2015). 17(5). 681-689. 13 May doi:10.1016/j.chom.2015.03.006

Execution

Use

chmod 755 eJTK-CalcP.py

to make eJTK-CalcP.py executable.

Type

cd bin/
python setup.py build_ext --inplace
cd ../bin/

to compile the Cython code.

Running this command will procude three files

./eJTK-CalcP.py -f example/TestInput4.txt -w ref_files/waveform_cosine.txt -p ref_files/period24.txt -s ref_files/phases_00-22_by2.txt -a ref_files/asymmetries_02-22_by2.txt -x cos24_ph00-22_by2_a02-22_by2_OTHERTEXT
  1. example/TestInput4_cos24_ph00-22_by2_a02-22_by2_OTHERTEXT_jtkout.txt This is the output of eJTK.py, it contains the best reference waveform matching each time series. Best is defined as having the highest Tau value. This becomes input for CalcP.py.

  2. example/TestInput4_cos24_ph00-22_by2_a02-22_by2_OTHERTEXT_jtknull1000.txt This is the output of eJTK.py unless otherwise specified by the -n flag (see eJTK-CalcP.py -h for more information). It similar to *jtkout.txt only it contains the results of 1000 runs of Gaussian noise. It is also an input for CalcP.py

  3. example/TestInput4_cos24_ph00-22_by2_a02-22_by2_OTHERTEXT_jtkout_GammaP.txt This is the output of CalcP.py. It is the equivalent of *jtkout.txt, only now with correct p-values as estimated by fitting the time series to a Gamma distribution. It also contains a column of these p-values adjusted with the Benjamini-Hochberg correction.

If you run the above command as is will produce files with a '_1' appended, as these files already exist in the examples folder.

Output information:

The output *jtkout.txt and *GammaP.txt files have columns as described below

ID: Name of time series analyzed

Waveform: Waveform used in analysis

Period: Period of best matching reference waveform

Phase: Phase of best matching reference waveform

Nadir: Trough of best matching reference waveform

Mean: Mean of time series

Std_Dev: Standard Deviation of time series

MaxLoc: Location of maximum of time series

MinLoc: Location of minimum of time series

Max: Maximum value of time series

Min: Minimum value of time series

Max_Amp: Max-Min of time series

FC: Fold Change (Max/Min)

IQR_FC: Fold Chnage of 25% and 75% percentiles of time series

Tau: Highest Kendall Tau Correlation between time series and reference waveforms

P: P-value corresponding to Tau, uncorrected for multiple hypothesis testing

BF: Bonferroni adjusted P-value from P

empP: min(P-value calculated from empirical null distribution,BF)

GammaP: min(P-value calculated from Gamma Fit of empirical null distribution,empP)

GammaBH: Benjmanini-Hochberg adjusted p-value of GammaP

##Version information:

  • Python 2.7.11 (default, Dec 5 2015, 14:44:47)

  • [GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.1.76)] on darwin

  • cython.version 0.24

  • scipy.version 0.15.1

  • numpy.version 1.11.0

  • statsmodels.version 0.6.1

License:

This code is released with the MIT License. See the License.txt file for more information.