ncats/stitcher

Allow top/skip on GET https://stitcher.ncats.io/api/stitches/latest when using filters

Opened this issue · 4 comments

The API call appears to support the top and skip query options but they don't actually work.

Can you provide an example URL that demonstrates the kind of filtering that you are doing?
Simple stuff like http://stitcher.ncats.io/api/stitches/latest?skip=10&top=5 seems to be working

Sure... I was trying with /api/stitches/latest?filter=Status/Launched and I attempted a pull request #147 with some changes that might work. I haven't gotten it compile yet. I am Java stupid.

OK. What is going on here is a little complicated. Let me try and break that down as best I can. TLDR = we run out of memory on the filter step, but also I am not sure this filter is giving you what you really want.

First, the "status" property comes from a couple of sources - "Launched" as a value specifically comes from Broad. You can see all the places "status" comes from by looking at https://stitcher.ncats.io/api/datasources and specifically looking for which sources provide a "status" property.

NOTE: this "status" is different from the status shown at drugs.ncats.io. Inxight: Drugs calculates its regulatory status from the "highestPhase" property at the stitch object top level. The value of "highestPhase" property corresponds to the ID of a regulatory event from the "events" array. That referenced event contains the highest development status achieved and citation information for that event. That means, however, that is not possible to simply filter the stitches in the way that you might want for all of the marketed drugs. This should probably be the subject of another issue/feature request.

If we look at other properties, such as the one "WIKIPEDIA" from GSRS ... top and skip work just fine
https://stitcher.ncats.io/api/stitches/latest?filter=WIKIPEDIA
https://stitcher.ncats.io/api/stitches/latest?filter=WIKIPEDIA&top=3
https://stitcher.ncats.io/api/stitches/latest?filter=WIKIPEDIA&top=3&skip=2

In fact, https://stitcher.ncats.io/api/stitches/latest?filter=WIKIPEDIA/ALOSETRON&top=3&skip=2 properly filters down to one single record, and the now meaningless top and skip are reset to 1 and 0, respectively in the response.

Now, looking specifically at /api/stitches/latest?filter=status/Launched
something funny is going on. On my development instance, this produces an out of memory error. In prod, the server might be crashing, restarting and not producing any useful kind of response for you. I believe there must be a couple of really large records in the approved products list that ruin the memory. In fact, when page over all of the stitches, I increment in steps of 10. I can page through all of the json for all stitches in about 10 minutes on the prod server.

Leaving the current issue as a bug report ... we need to protect the server from out of memory issues, perhaps by truncating some of the stitch records, somehow.

Could you guys have the server setup to auto-restart if it crashes?