broadinstitute/dig-diabetes-portal

Could we maintain a comprehensive variant list, whether or not we have significant data?

Opened this issue · 3 comments

Right now there are lots of variants for which we have no records. Understandably we will not have statistically significant data with relevance to diabetes for every possible human variant, but would it be possible to have a record in the database for the variant? To be able to say to the investigating scientist "yes, that's a variant, we know about it, and we have detected no indications that this variant is relevant to diabetes" would be more reassuring, I think. The alternative, to show nothing at all, would leave someone hitting our database to wonder whether we just don't know about this variant, or whether the database needs to be updated.

I realize of course this is more a question for the backend then the front end, though the front end needs to refine the way that sparsely populated records are handled.

That's going to be hard to maintain.

What do you mean by "every possible variant"?

We could certainly link to public DBs (e.g. dbSNP), but I wouldn't want to
maintain a mirror of it. Similar stories would apply to large sequencing
studies (e.g. the 91K).

Plus, there are then two issues: (a) we won't know all variation in the
world until we sequence everyone, and (b) even if we did, I think we are
only interested in showing those in "our" data, since the point of the
portal is to query data of relevance to T2D.

Probably the best option is to try to link our site via the SOA model to
other resources of variation, and to pull those into the pages in a
seamless way.

Lots of threads to which that broader aim is relevant; something I am
working on but will take a bit of time to flesh out.

On Sun, Oct 26, 2014 at 12:24 PM, Ben R. Alexander <notifications@github.com

wrote:

Right now there are lots of variants for which we have no records.
Understandably we will not have statistically significant data with
relevance to diabetes for every possible human variant, but would it be
possible to have a record in the database for the variant? To be able to
say to the investigating scientist "yes, that's a variant, we know about
it, and we have detected no indications that this variant is relevant to
diabetes" would be more reassuring, I think. The alternative, to show
nothing at all, would leave someone hitting our database to wonder whether
we just don't know about this variant, or whether the database needs to be
updated.

I realize of course this is more a question for the backend then the front
end, though the front end needs to refine the way that sparsely populated
records are handled.


Reply to this email directly or view it on GitHub
#79.

Instead of "every possible" I should've said "every variant thus far identified". And I appreciate that the number of known variants is constantly changing, and definitely I'm not suggesting that we mirror the information already available in dbSNP -- surely that would be kind of a waste of our efforts.

Maybe this request does boil down mostly to a UI change. If we could simply identify variants for which we have no information and state "rs12345 has no known impact on the T2D disease process" (maybe along in with a link that takes people to some other database) then we wouldn't leave people hanging if they type in a variant that isn't in our database.

On the other hand, I wonder if going forward, as we accumulate more and more data sets, if we will end up with more references that are have varying degrees of completeness. Maybe variants that have been detected but don't reach some level of significance that allows us to draw meaningful conclusions? Others that that have been identified but which haven't run through the pipeline yet? I'm just making up these corner cases, but I'm wondering if there will be cases where we have a little information about a variant but not enough for a full presentation. And in those cases would we be able to show what we've got, or is it better to only present variants if we can tell their story in full detail?

In any case it's true that I don't have enough understanding of the pipeline to propose any sort of a coherent strategy. I only know that it bugs me when I see a variant in a paper and I type it in to the portal and get back nothing, but you can tell me how the portal should properly be responding and I will make sure it's implemented.

I think you have to distinguish between "variants in our data" and
"variants the world knows about"

We absolutely will have data for every variant in our data. For those,
we'll include all of the information we have. For those not in our data,
we'll have nothing, except for what we can pull from external databases. So
I really think this is about accessing external services, and integrating
those into our pages.

On Mon, Oct 27, 2014 at 10:52 AM, Ben R. Alexander <notifications@github.com

wrote:

Instead of "every possible" I should've said "every variant thus far
identified". And I appreciate that the number of known variants is
constantly changing, and definitely I'm not suggesting that we mirror the
information already available in dbSNP -- surely that would be kind of a
waste of our efforts.

Maybe this request does boil down mostly to a UI change. If we could
simply identify variants for which we have no information and state
"rs12345 has no known impact on the T2D disease process" (maybe along in
with a link that takes people to some other database) then we wouldn't
leave people hanging if they type in a variant that isn't in our database.

On the other hand, I wonder if going forward, as we accumulate more and
more data sets, if we will end up with more references that are have
varying degrees of completeness. Maybe variants that have been detected but
don't reach some level of significance that allows us to draw meaningful
conclusions? Others that that have been identified but which haven't run
through the pipeline yet? I'm just making up these corner cases, but I'm
wondering if there will be cases where we have a little information about a
variant but not enough for a full presentation. And in those cases would we
be able to show what we've got, or is it better to only present variants if
we can tell their story in full detail?

In any case it's true that I don't have enough understanding of the
pipeline to propose any sort of a coherent strategy. I only know that it
bugs me when I see a variant in a paper and I type it in to the portal and
get back nothing, but you can tell me how the portal should properly be
responding and I will make sure it's implemented.


Reply to this email directly or view it on GitHub
#79 (comment)
.