geneontology/api-gorest-2021

Replace Barista dependencies in the GO-CAM API

Closed this issue · 22 comments

kltm commented

Recently, it seems that barista is having trouble, possibly related to increased use or volume and possibly related to its use as an upstream API (a la

let url = "http://barista.berkeleybop.org/api/minerva_public/m3Batch?token=&intention=query&requests=%5B%7B%22entity%22%3A%22model%22%2C%22operation%22%3A%22get%22%2C%22arguments%22%3A%7B%22model-id%22%3A%22gomodel%3A" + idtrim + "%22%7D%7D%5D";
).

We'd like to take a look at other ways of supplying this JSON data to the API in the short run. Ideas include

  1. render during pipeline and push to public location for use by API
  2. load into solr and have API use that
  3. other?
kltm commented

After a little discussion, I think we will likely go for the rendered JSON route as a short-term solution (and allowing us to get to the AmiGO widgets faster), while keeping the Solr-ization on the books when we start bringing in the GO-CAM API to the GO API.

kltm commented

@dustine32 Following up from the software meeting, scanning through the code, these seem to be the only instances of contact:
/gocam/:id/raw uses

let url = "http://barista.berkeleybop.org/api/minerva_public/m3Batch?token=&intention=query&requests=%5B%7B%22entity%22%3A%22model%22%2C%22operation%22%3A%22get%22%2C%22arguments%22%3A%7B%22model-id%22%3A%22gomodel%3A" + idtrim + "%22%7D%7D%5D";

/gocam/:id/activities uses
let url = "http://barista.berkeleybop.org/api/minerva_public/m3Batch?token=&intention=query&requests=%5B%7B%22entity%22%3A%22model%22%2C%22operation%22%3A%22get%22%2C%22arguments%22%3A%7B%22model-id%22%3A%22gomodel%3A" + idtrim + "%22%7D%7D%5D";

/gocam/:id/enriched uses
let url = "http://barista.berkeleybop.org/api/minerva_public/m3Batch?token=&intention=query&requests=%5B%7B%22entity%22%3A%22model%22%2C%22operation%22%3A%22get%22%2C%22arguments%22%3A%7B%22model-id%22%3A%22gomodel%3A" + idtrim + "%22%7D%7D%5D";

To my eye, these all seem to be identical and seem to bypass any kind of middleware, making them easy to change. I believe simply routing these to a static location will solve any problems we're having. Can you think of any reason this might not work?

@balhoff What would be the best way to approach getting the currently returned raw JSON form out of minerva (or the noctua-models repo)? I'm assuming we don't currently have a command or CLI for this, but I'm wondering how hard it might be to add or if there might be other approaches?

@kltm Thanks, I never realized the barista URLs were repeated like this!

Can you think of any reason this might not work?

Nope, can't think of any reason changing to a static JSON endpoint wouldn't work. I'm all for this.

kltm commented

@balhoff (Bringing this up to a top-level comment for reference)
What would be the best way to approach getting the currently returned raw JSON form out of minerva (or the noctua-models repo)? I'm assuming we don't currently have a command or CLI for this, but I'm wondering how hard it might be to add or if there might be other approaches?

So basically the format returned to Noctua when you request a whole model? I think it would be pretty easy to add a dump CLI command for that.

kltm commented

Okay, with Minerva and initial pipeline work out of the way (geneontology/minerva#500), this becomes a deployment and API issue.
We now have a new product that will start becoming available (already tested on master):
*/products/json/noctua-models-json.tgz, which is currently like 80MB and contains all of the models individually in JSON. We'll want to:

  • deploy this to a public location as part of the pipeline or release
  • change the API to point at these
kltm commented

@dustine32 For when you're back around, I believe that URLs like this now work (although not automated yet): https://go-public.s3.amazonaws.com/files/go-cam/<GO-CAM_INTERNAL_ID>.json. I guess all that would be needed is 1) automating getting it into S3 (maybe as part of the other JSON pipeline) and 2) changing the URLs.

kltm commented

s3cmd -c KEYFILE --acl-public --mime-type=application/json put *.json s3://go-public/files/go-cam/

kltm commented

Will add the above command to issue-265-go-cam-products once JSON tarball propagates to release.

kltm commented

Now testing in issue-265-go-cam-products.

kltm commented

@dustine32 Okay, it looks like the regular SOP running of issue-265-go-cam-products during the release process will now supply updated JSON files to the correct location in S3 (above). The final step we have here then is aiming the GO-CAM API at the correct location. I'm not sure if this is better to wait after the node 14 switchover or to get it done before the deadline. Either way, we can talk more about it on Thursday if we don't overlap before.

@kltm Thank you! We can test this with dev instances of agr_ui and api-gorest-2021 quick and then, if there aren't any issues, push it before the node 14 switchover?

kltm commented

I wouldn't argue with however you think the best way to test might be. I'm most worried about something getting messed up with mimetypes or the like.

the go-fastapi does not have any dependencies on barista, so removing this from that project for now. please of course add back if there are endpoints that do use barista that need to be migrated to the new API in support of the UI. It makes sense to me to have a step in the pipeline (python scripts potentially instead of API calls with a barista backend) that produces necessary files as artifacts without the API dependency.

kltm commented

@sierra-moxon Clarifying initial comment (#6 (comment)), this is no longer in the merged API?

kltm commented

@sierra-moxon just confirming that the merged GO-CAM API has no barsita connections, correct?

Right; just Golr and RDF endpoint.

kltm commented

@sierra-moxon Great--binned!

kltm commented

This got accidentally closed in the shuffle.

kltm commented

This is a dependency for geneontology/wc-gocam-viz#25

kltm commented

Keeping open until public deployment and they're gone gone. (The reason we often have a Clearing state in the kanban.)

kltm commented

@sierra-moxon lapped me on this :)