ranking-agent/strider

Workflow B

Closed this issue · 7 comments

This issue is to kindly request that Ranking Agent tests Workflow B. Here is an example TRAPI query that runs went sent through the ARS, but the only ARA that returns responses is ARAX. The ARS issue is apparently related to time-out errors that will be resolved with the release of TRAPI 1.2 and support for asynchronous queries, but that will not happen until early September. Thus, Hao and I would like to test direct ARA queries, and we are requesting your help. Note that we are also testing direct ARA queries using multihop.py, but strider tests would be helpful, as multihop.py doesn't support sophisticated features like ranking, etc.

The README file provides an overview of the workflow, including suggested input CURIES. Note that the first predicate will be changed from biolink:correlated_with to biolink:has_real_world_evidence_of_association_with, but I will be sure to alert you to this change after clinical KPs implement the new predicate.

We are also testing query constraints, such as replacing biolink:related_to with biolink:interacts_with and biolink:Gene with biolink:BiologicalProcessOrPathway. Other options are welcome. For instance, if the full three-hop query does not run, or perhaps even if it does, Chris and I discussed testing the first two-hops using the answer coalescer.

We realize that you all are extremely busy, but we would greatly appreciate a few quick tests. Please feel free to reach out to me or Hao with questions.

  • B.0: 25728 results in <5 minutes
  • B.1a: 14186 results in 2.8 minutes, using ICEES DILI!
  • B.1b: 93440 results
  • B.1c: 28646 results using ICEES DILI for e01
  • B.1d: 174931 results - this one took a while to merge results, but still <10 minutes
  • B.1e: 1142 results in 3.5 minutes
  • B.2: 500 results in 54.4 seconds*
  • B.2a: 44 results in 2.4 minutes
  • B.2b: 113799 results in 3.56 minutes
  • B.2c: 370 results with e0 predicate biolink:treats, none from ICEES DILI
  • B.2d: 161627 results in 4.10 minutes
  • B.2e: 44 results in 2.4 minutes
  • B.3a: is identical to B.2?
  • B.3b: 500 results in 24.9 seconds*
  • B.3c: 0 results - contacts COHD, ICEES Asthma, and ICEES DILI for e01, gets nothing
  • B.3d: 0 results - contacts COHD, ICEES Asthma, and ICEES DILI for e01, gets nothing
  • B.4a: 6265 results in 20.3 seconds*
  • B.4b: 2 results in 5.3 seconds*
  • B.4c: 278 results in 8.4 seconds*
  • B.4d: 0 results - contacts RTX KG2 for e0, gets nothing
  • B.5: 188 results in 13.9 seconds
  • B.6: 135 results in 8.2 seconds
  • B.7: 24816 results in 3.4 minutes

* COHD complains that "biolink:ChemicalSubstance was not recognized as a biolink category". See NCATSTranslator/testing#115 (comment).

Nice. Heads up that a bunch of these workflows have just changed in the repo.

Aragorn is currently not returning results for the B.1x and B.2x async queries.

For example, B.1a is showing errors in the log of its response:

  "logs": [
    {
      "code": null,
      "level": "ERROR",
      "message": "strider error: HTML error status code 500 returned.",
      "time stamp": "09/23/2021-07:33:48",
      "timestamp": null
    },
    {
      "code": null,
      "level": "ERROR",
      "message": "answer_coalesce error: HTML error status code 500 returned.",
      "time stamp": "09/23/2021-07:33:48",
      "timestamp": null
    },
    {
      "code": null,
      "level": "ERROR",
      "message": "Exception: 'NoneType' object is not subscriptable",
      "time stamp": "09/23/2021-07:33:49",
      "timestamp": null
    },
    {
      "code": null,
      "level": "ERROR",
      "message": "omnicorp error: HTML error status code 500 returned.",
      "time stamp": "09/23/2021-07:33:49",
      "timestamp": null
    },
    {
      "code": null,
      "level": "ERROR",
      "message": "Exception: 'NoneType' object is not subscriptable",
      "time stamp": "09/23/2021-07:33:49",
      "timestamp": null
    },
    {
      "code": null,
      "level": "ERROR",
      "message": "weight error: HTML error status code 500 returned.",
      "time stamp": "09/23/2021-07:33:49",
      "timestamp": null
    },
    {
      "code": null,
      "level": "ERROR",
      "message": "Exception: 'NoneType' object is not subscriptable",
      "time stamp": "09/23/2021-07:33:49"
    },
    {
      "code": null,
      "level": "ERROR",
      "message": "score error: HTML error status code 500 returned.",
      "time stamp": "09/23/2021-07:33:49"
    }
  ],

And B.2a says Query commenced. Will send result to https://ars-dev.transltr.io/ars/api/messages/a7e552dc-5b55-4a08-b6e4-b2f6c81cea89 (several hours after being run).

@patrickkwang : B.1c and B.2c should run just fine and return results from ICEES DILI, but not COHD. The direct ARAGORN query below ran a couple of weeks ago.

curl -XPOST https://aragorn.renci.org/1.1/query -d '{                                                                            
                                                                                                                                                                        
                                                                      "message": {                                                                                      
                                                                          "query_graph": {                                                                              
                                                                              "nodes": {                                                                                
                                                                                  "n0": {                                                                               
                                                                                       "ids": ["MESH:D056487"],                                                         
                                                                                       "categories": ["biolink:DiseaseOrPhenotypicFeature"]                             
                                                                                  },                                                                                    
                                                                                  "n1": {                                                                               
                                                                                      "categories": ["biolink:DiseaseOrPhenotypicFeature"] 
                                                                                  },
                                                                                  "n2": {
                                                                                      "categories": ["biolink:Gene"]                      
                                                                                  },
                                                                                  "n3": {
                                                                                      "categories": ["biolink:ChemicalEntity"]
                                                                                  } 
                                                                              },         
                                                                              "edges": {                                      
                                                                                  "e01": {
                                                                                      "subject": "n0",
                                                                                      "object": "n1",
                                                                                      "predicates": ["biolink:correlated_with"]
                                                                                  },                  
                                                                                  "e02": {           
                                                                                      "subject": "n2",                                                     
                                                                                      "object": "n1",
                                                                                      "predicates": ["biolink:gene_associated_with_condition"]
                                                                                  },                  
                                                                                  "e03": {           
                                                                                      "subject": "n2",                                        
                                                                                      "object": "n3",
                                                                                      "predicates": ["biolink:related_to"]
                                                                                  }                   
                                                                              }                      
                                                                          }                                               
                                                                      }            
                                                                  ,            
                                       "workflow": ["lookup","connect_knodes","score"]}' -H "Content-Type: application/json"

When reviewing results from that query, I included the following comment in my notes:

One thing to note is that ARAGORN is labeling this MESH ID: https://meshb.nlm.nih.gov/record/ui?ui=D056487 (Chemical and Drug Induced Liver Injury, Chronic) drug-induced hepatitis, probably because MESH maps that to this MESH ID https://meshb.nlm.nih.gov/record/ui?ui=D056486 (Chemical and Drug Induced Liver Injury).

Not sure how helpful the comment is, but I thought I'd pass it along.

@karafecho B.1c is returning results, and B.2c with biolink:treats predicate is returning results but none from ICEES DILI. The remaining workflows don't seem to be active anymore. Are there any other queries you want to test out?

Thanks for following up on this, @maximusunc . I think Patrick's tracker is a bit outdated, however. The current Workflow B queries can be found here.

Thanks!

That said, the B.1c results and B.2c results you report on are expected, so that is good.