Filter by frequency
bschilder opened this issue · 3 comments
Frequency of disease is a key factor in viability currently (until N=1 legislation is passed)
Frequency types
From the annotations provided by HPO, we currently have:
phenotype-disease frequency
- Data source: phenotype.hpoa
- How frequently a given phenotype is associated with a disease (across individuals)
- https://neurogenomics.github.io/RareDiseasePrioritisation/reports/HPO_annotations#Plot_proportions
- ~2/3 of phenotype-disease pairs have this annotation.
gene-phenotype frequency
- Data source: genes_to_phenotype.txt
- I'm still unclear on the exact meaning of this frequency column. When we spoke to Peter about it I believe he said it could be a mixture of gene-phenotype, gene-disease and gene-phenotype-disease frequencies.
- 32.7% gene-phenotype-disease triads have this annotation
phenotype frequency
We do not currently have data on absolute phenotype frequency in the general population.
Will need to gather.
disease frequency
We do not currently have data on absolute disease frequency in the general population.
Will need to gather.
Potential resources
Assessing Orphanet data
See here for a complete rmarkdown report assessing the Orphanet prevalence data.
https://neurogenomics.github.io/RareDiseasePrioritisation/reports/orphanet_prevalence
Takeaways:
- Prevalence is a more complex idea than we thought.
- While Orphanet is pretty comprehensive, it only contains prevalence data for a portion of diseases/phenotypes. Unclear whether this is due to incompleteness of Orphanet, or the lack of data on this in general.
- Mapping IDs to allow for cross-database merging is hard. I'm currently able to map >99% of diseases in HPO, but only 4% of phenotypes in the HPO.
There’s various ways of getting at prevalence… one might be to look at data behind pLOF intolerance…. But it is not all that clear whether that would correspond with phenotype prevalence (eg the phenotype might be caused by CNV, or repeat expansions, or mis sense mutations). I think it’s probably to big an ask for this paper. Would need a well thought out project proposal with ideas on how to validate. You agree?
Mutation frequency and phenotype frequency (or disease frequency) are two related but very different concepts. Equating the two only works when the phenotype/disease is truly monogenic (a single causal gene across all individuals) and 100% penetrant (it always causes the phenotype/disease, regardless of genetic background or environmental exposures). That isn't the case for any of the phenotype for which we have celltype enrichment results we have, as we only included phenotypes with >=4 genes. Even if we did expand to phenotypes with n=1 gene, this doesn't preclude that there aren't more causal genes that haven't yet been discovered, or that penetrance isn't 100% (we have very limited data on this).
Currently, I propose we use frequency data from Orphanet as a guide when selecting top therapeutic candidates, rather than a hard requirement (i.e. removing all disease/phenotype candidates for which we don't have prevalence data).