Shopify/go-lua

Limiting script execution time reliably

sigmonsays opened this issue · 3 comments

I'm struggling to find a good solution to implementing a reliable mechanism to limit how long a script is allowed to run.

A lot of examples show using debug.sethook to check if a timer has elapsed every 100k instructions or so but this is not that reliable as a C call can block.

Another solution is to use alarm system call or setrlimit on lua in a new thread/process but that has its own set of drawbacks.

Has a script timeout been considered before as something we can set on the VM before executing a pcall?

go-lua was written before the context package was a thing. It would be ideal if most of its exported functions took a ctx context.Context argument, and then you could simply use the context.WithTimeout(...) pattern. That's not possible in general, but in a controlled environment workarounds are possible. For Shopify's genghis tool, we override sleep, for example, to allow context cancellation from the function that invokes the script:

func registerSleep(ctx context.Context, l *lua.State) {
	l.Register("sleep", func(l *lua.State) int {
		ns := lua.CheckNumber(l, 1)
		childCtx, _ := context.WithTimeout(ctx, time.Nanosecond*time.Duration(ns))
		<-childCtx.Done()
		// Check whether the run was terminated or the timeout expired.
		select {
		case <-ctx.Done():
			// Run was terminated.
			lua.Errorf(l, ctx.Err().Error())
		default:
		}
		return 0
	})

	// Override time.sleep from goluago.
	l.PushGlobalTable()
	l.Field(-1, "package")
	l.Field(-1, "loaded")
	l.Field(-1, "goluago/time")
	l.Field(-4, "sleep")
	l.SetField(-2, "sleep")
	l.Pop(4) // Pop the tables.
}

Our most common blocking native functions also take an outer context.

That seems reasonable and was the approach I took for now. It seems easier to control everything a script does which allows being timed out.

While we're at it, some mechanism to limit the VM's RAM usage would be appreciated as well.