CGPS: A machine learning-based approach integrating multiple gene set analysis tools for better prioritization of biologically relevant pathways
CGPS is a tool for integrating the multiple gene set enrichment analysis using machine learning method. It integrates nine gene set enrichment analysis method: GSE, GSA, PADOG, PLAGE, GAGE, Globaltest, SAFE, CEPA and GANPA. CGPS simultaneously integrates the p-value and rank obtained from nine prominent GSE methods into a reliable ensemble score (R score) and then generates a new gene set ranking based on the R score, representing a powerful approach for prioritizing biologically relevant gene sets. For details, please refer to this paper: Chen, A., and Kong, L. (2018). CGPS: A machine learning-based approach integrating multiple gene set analysis tools for better prioritization of biologically relevant pathways. Journal of Genetics and Genomics 45, 489–504.
We use the gold standard datasets with known drug-targeted pathway, the rank and p-value gained from all the nine methods are used to train a SVM model. This model integrates the results by providing an ensemble score R score, to sort pathway as integrated results. The rank represents how whether the pathway is similar to the target pathways.
R version 3.5.1 python version 2.7
install R package:
- Biobase
- limma
- edgeR
- gage
- GSVA
- EnrichmentBrowser
install python modules:
- numpy
- pandas
- sklearn
Notes:
- only support KEGG PATHWAY now.
Data format Please refer to 'test' directory the test data is generated by microarray.
Rscript combined_methods.R expfile phefile datatype=[ma/rseq] datadir python predict.py datadir outdir python run_cgps.py -e expfile -p phefile -d datatype -s species -o outdir
- expfile: file to save expression data
- phefile: file to save the experiment category of expfile
- data type : for expression data, "rseq" for RNA-Seq data, 'ma' for microarray data
- datadir: directory to save the results of individual methods
- species: abbr of the species, to find the abbr of the species for analysis, please refer to http://www.kegg.jp/kegg/catalog/org_list.html
- outdir : directory to save the combined results of CGPS