This module allows for:
- Training PSSM/TFFM/4-bits + DNA shape classifiers on ChIP-seq data
- Applying PSSM/TFFM/4-bits + DNA shape classifiers on ChIP-seq data
Note that only the best hit per ChIP-sequence is considered in the current version of the module.
The module requires:
- python2.7 (and does not work with python3 in its current version).
- the BioPython module www.biopython.org.
- the TFFM package accessed from your PYTHONPATH environment variable.
- the scikit-learn module.
- the XGBoost module.
- the pandas module.
- access to bigWig files providing the values of the DNA shape features HelT, MGW, ProT, and Roll from your genome interest along with the second order computation these features. Please visit the GBshape website.
- the bwtool.
You can find some examples of how to run the DNAshapedTFBS.py tool in the script test.sh provided in the test/ repository of this package.
The script feature_importance_heatmap.py plots the heatmap(s) of trained classifier(s). Note that the current version only works for PSSM/TFFM + DNA shape classifiers. You can get help on how to use it by typing
python2.7 feature_importance_heatmap.py -h
For information on the source tree, examples, issues, and pull requests, see
http://github.com/amathelier/DNAshapedTFBS
If you use the DNAshapedTFBS tool, please cite
- A. Mathelier, B. Xin, T.-P. Chiu, L. Yang, R.R. Rohs, and W.W. Wasserman (2016) DNA shape features improve transcription factor binding site predictions in vivo. Cell Systems, DOI:10.1016/j.cels.2016.07.001.