algorithmiaio/langpacks

LoadComplete status sent before load completes

Closed this issue · 3 comments

pmcq commented

Currently langserver will send the status update of "load complete" after launching the pipe script, which does not guarantee that the load actually completed successfully.

Fundamentally, for some languages the load could be arbitrarily long-running code. I see 3 approaches:

  1. Provide a simple mechanism for each language-specific pipe to notify langserver (e.g. create a file) when the load has completed. (I'm mostly opposed because it's very vulnerable to abuse and hard to revert when we discover it being abused, because it could affect pricing of non-abuse scenarios)
  2. Accept this as by design. If the load fails quickly (e.g. bad import), I expect pipe will exit and send that status quickly. If the load is long-running, then the subsequent request will block (and be metered) for the extra time spent loading.
  3. Compromise by starting with the first option, but have langserver timeout when waiting >N seconds for that notification from the pipe script. If it's still loading after N seconds, send the load complete anyway. Further loading will continue, but will block the request when it arrives (and be metered). It's basically defining a time of "N seconds of free algorithm initialization compute." Though, this starts feeling very unidiomatic for native languages where expensive initializers before main is considered bad practice. :-s
pmcq commented

For 1. I think the abuse scenario is definitely a possibility though if we start billing for load_time then abuse will no longer exist. I'm not sure how long loads typically take but if it's timed by langserver I image it would be very fast (less than a second is all that matters) except for when people are abusing the system. We could even make it so we bill for load time >=1s so people not abusing the system are exempt

The main issue being addressed is that if a slot is not processing a load or work request then there is no request_id/session_id that can be used to authenticate a users's actions and so they couldn't access the data API at all (without doing creating all new Authorization type + associated parsing etc logic). That seemed like a bigger task is my concern, though it is possible and I'd likely just rewrite all the auth code to not look be so heinous in the first place

Resolved by 3dece59