ranking-agent/strider

Why is IRF5 not showing up?

cbizon opened this issue · 12 comments

Query:

query={
    "message": {
      "query_graph": {
        "edges": {
          "e00": {
            "subject": "n00",
              "object": "n01",
          "predicates":["biolink:related_to"]
          },
          "e01": {
            "subject": "n01",
              "object": "n02",
          "predicates":["biolink:related_to"]
          }
        },
        "nodes": {
          "n00": {
            "ids": ["PUBCHEM.COMPOUND:644073"],
            "categories": ["biolink:ChemicalEntity"]
          },
          "n01": {
              "categories": ["biolink:Gene"]
          },
          "n02": {
            "ids": ["HP:0000217"],
            "categories": ["biolink:DiseaseOrPhenotypicFeature"]
          }
        }
      }
    }
  }

Genes connecting a particular drug to drymouth

Results:
https://arax.ncats.io/?r=18c9bfce-266c-412e-98aa-ee468e868c93

We have 1 result, BTE has 2. How come we're not getting the IRF5 one?

Possibly a repeat of #361

Also, ARAX is finding an SLC gene and CD69

Another example: NCATSTranslator/testing#189. Arax and others are returning things that we are not, even though we query those things.

Strider isn't getting IRF5 because that is coming from Text Mining Provider Targeted. That KP isn't being used because it doesn't have an x-trapi block in its openAPI spec. I've opened an issue here: NCATSTranslator/Text-Mining-Provider-Roadmap#95

Is this one that should be coming in through the service provider interface?

If it is supposed to, it isn't. But that brings up another issue if it does start to, and that's that we would then be hitting text mining twice since it's registered as a KP and once they fix their smart api registration, we'll start using it.

Ok, after some deeper digging, we are getting access to text mining through Service Provider, so that's not the issue. The issue I believe is that we're doing some normalization on Xerostomia (HP:0000217) and ending up sending it with only the category of "PhenotypicFeature", which is more specific and therefore BTE isn't returning IRF5. I've confirmed this by sending queries directly to BTE with the two different categories and I get different answers. (PhenotypicFeature returns only Amylases, and DiseaseOrPhenotypicFeature returns both Amylases and IRF5)

I feel like this brings up an intriguing problem. I feel like normalizing is the right thing to do and we're actually getting a more accurate category for that node, but in the end might be hurting us? Or is IRF5 actually not a good answer because it requires a path more on the "Disease" side?

Hmm, interesting. What happens if you send the category "disease"? I don't think Xerostomia would ever be considered a disease, but I could be wrong. Overall, I think this is a BTE bug & would make an issue there about it.

TBH, I think sending categories on nodes with identifiers is kind of goofy anyway - the identifier points to the thing, which is what it is

Sending disease still returns both answers.

If this is a bug in BTE, then you're saying IRF5 is actually a bad answer to this query. What about ARAX and the SLC gene and CD69? I'm assuming the same kind of thing might be happening on that side, but I haven't looked into it yet.

I agree sending categories along with identifiers is silly. I'm curious how the KPs are set up to be sending extra answers when the category is different.

If this is a bug in BTE, then you're saying IRF5 is actually a bad answer to this query.

I don't think I'm saying that? I actually don't know if it's good or bad, just that I think we should get the same answers no matter what type we send in there. So either IRF5 should come out both ways or neither.

It wouldn't surprise me if the SLC gene was a similar issue