Feasibility of using custom allele frequency data
Opened this issue · 1 comments
Thanks and congratulations on this great tool.
Given that gnomAD predominantly represents certain populations, how could we adapt GeniE to incorporate allele frequencies from our custom dataset, which focuses on an underrepresented population?
Are there considerations for sample size, population structure, or statistical methods? as well as other challenges to consider (dev effort for eg).
Regards
Thank you for your questions, I completely agree that representation is one of the biggest hurdles to accurate genetic prevalence estimates for all diseases. We are already talking about methods of integrating other databases, which take into account things like database overlap (duplicates), relatedness, and methods of variant calling.
One potential option is to allow people to repeat estimates across multiple databases and return separate results. Though this could make interpretation challenging when there are large variances in the results. Another option is the gnomAD team is actively working on federating gnomAD. Federation allows other population databases (many of which are currently underrepresented) to use gnomAD best practices, and then pull in the aggregated AFs from all those databases into gnomAD. This would allow communities to maintain their own population databases, while also improving global AFs, and subsequently GeniE’s estimates.
We are still in the planning phases of these efforts, so I can't provide you with specific requirements at this time (other than federated gnomAD will require using gnomAD's pipeline). If you are interested in participating please feel free to reply to prev-genie@broadinstitute.org with more details about your database (size, types of data, make up of individuals included) and we can discuss more specifics.