CADA (Case Annotations and Disease Dnnotations) is a phenotype-driven gene prioritization tool for rare syndromes. The tool utilizes both disease-level annotations from Human Phenotype Ontology (HPO) and clinical cases-level annotations to construct a gene-phenotype association network. Later, by applying network representation learning method on the network, disease-causing genes are prioritized by a link prediction task.
This tool was developed during the master's thesis of Chengyao Peng https://github.com/Chengyao-Peng.
The case data used in CADA is in data/processed/cases/
. There you can find all cases in all_cases.tsv
, which consist of cases from Clinvar in clinvar_cases.tsv
and cases from our collaborators in collaborator_cases.tsv
. All cases are splitted into sets of training cases_train.tsv
, validation cases_validate.tsv
and test cases_test.tsv
with the ratios 60%, 20% and 20%.
CADA
can be installed locally with:
$ git clone https://github.com/Chengyao-Peng/CADA.git
$ cd CADA
$ pip install -e .
--hpo_terms a string of comma-separated HPO terms.
--weighted use weighted knowledge graph
--topn the number of wanted output prioritized genes
--out_dir an output file
CADA --out_dir cada_result --hpo_terms HP:0000573,HP:0001102,HP:0003115,HP:0001681,HP:0008067,HP:0004417 --weighted False --topn 10
The out result file from the example run will at 'cada_result/result.txt'.
rank gene_id gene_name score
1 Entrez:368 ABCC6 84.62940470377605
2 Entrez:5167 ENPP1 69.57813326517741
3 Entrez:54790 TET2 57.23555533091227
4 Entrez:64132 XYLT2 57.030126889546715
5 Entrez:3949 LDLR 55.80375734965006
6 Entrez:64240 ABCG5 53.74869124094645
7 Entrez:348 APOE 53.691530545552574
8 Entrez:462 SERPINC1 51.44988568623861
9 Entrez:255738 PCSK9 50.51583385467529
10 Entrez:2162 F13A1 50.0550905863444
We also provide a CADA Web Server.
See the LICENSE file for license rights and limitations (GNU GPLv3).