Unsuitable semantics for long running processes

Question

Unsuitable semantics for long running processes

Closed this issue 8 years ago · 11 comments

x1ddos commented 8 years ago

Consider a program which runs indefinitely:

A gcslock.Lock(l, time.Second) fails, for any reason.
The underlying locking goroutine will keep running indefinitely, until it succeeds.

[In the meantime, another path of the program may have successfully acquired and released the lock]

At last, the goroutine initially started in step 1 succeeds.
Inconsistent state. Next gcslock.Lock will never succeed now, until the lock is removed by an external job.

Answer 1 · 2016-11-02T10:40:03.000Z

One solution is to simply limit goroutine retries to some max.

Answer 2 · 2016-11-02T10:48:49.000Z

Another solution is to stop the locking/unlocking goroutine before returning an error.

Answer 3 · 2016-11-04T15:24:52.000Z

Actually, limiting num. of retries of a locking goroutine won't work. I believe the best approach is to always terminate the locking/unlocking goroutine before returning.

Answer 4 · 2016-11-15T21:12:06.000Z

So, some code which does something like this in a goroutine?

for {
  err := gcslock.Lock(l, time.Second)
  if err == nil {
    break
  }
}

The timed Lock fails for any reason (e.g. timeout), and some other part of the same program gets and releases the lock. Won't this loop eventually obtain the lock properly? Not seeing the inconsistency.

Answer 5 · 2016-11-15T21:32:05.000Z

That example is not a long running process. This one is:

// Loop indefinitely processing one task at a time.
for {
  // Receive a new task from a channel ch.
  task := <- ch
  // Try to obtain a global lock or skip the task, reporting the error.
  if err := gcslock.Lock(l, time.Second); err != nil {
    reportf("lock failed: %v", err)
    // Lock failed so we skip the task and wait for a new one.
    // Unfortunately, the above lock goroutine will keep running,
    // trying to acquire a lock we don't need anymore.
    continue
  }
  execute(task)
}

Most importantly, whenever the above gcslock.Lock(l, time.Second) fails, the underlying gcslock goroutine will needlessly keep trying to acquire the lock and may very well succeed at some point. Meaning, any future "useful" gcslock.Lock(l, time.Second) will never succeed anymore, until the lock is removed by an external process.

This is very similar to what would happen in the case with push-to-publish git lock.

What I think needs to happen instead is for the locking goroutine to stop even if gcslock.Lock returned an error, instead of keep running.

Answer 6 · 2016-11-15T21:44:13.000Z

I get it. Thanks for clarifying. I need to stop the goroutine when I get a timeout, something like this:

  case <-time.After(d):
    // stop running goroutine here 
    return errors.New("lock request timed out")

Looks like there's no way to externally stop a goroutine but I'm sure there's some trick I can use. Will do some googling.

Answer 7 · 2016-11-15T21:48:29.000Z

Exactly, the // stop running goroutine here part is what's missing.
The locking loop needs to be controlled and stopped - it currently never exits.

Answer 8 · 2016-11-15T21:51:03.000Z

There's no built-in way to stop a goroutine via some kind of a "goroutine handle" but it can be controlled using a context.WithTimeout or a channel. I would probably use a context with a deadline equal to the d argument passed to gcslock.Lock.

Answer 9 · 2016-11-15T22:05:06.000Z

On it, first thing tomorrow am. Thanks!

Answer 10 · 2016-11-15T22:09:43.000Z

This looks like exactly my scenario.

Answer 11 · 2016-11-15T22:11:25.000Z

sync.WaitGroup feels like an overkill just for one goroutine:

A WaitGroup waits for a collection of goroutines to finish ...