proposal: sync: add a HasRun method to sync.Once

Question

proposal: sync: add a HasRun method to sync.Once

Closed this issue 9 years ago · 11 comments

In some cases, a different code path should be taken depending on if sync.Once() has been executed (example below). It is possible to work around this issue by using a mutex, but Once can return this information more efficiently

func (o *Once) HasRun() bool {
    return atomic.LoadUint32(&o.done) == 1
}

Use case:
In gonum's implementation of a multivariate Normal distribution, we only compute and store the covariance matrix if necessary. Most commonly, users need to compute probabilities and generate random numbers, neither of which actually need the covariance matrix. Computing and storing the matrix is O(n^3) operation and O(n^2) respectively, which can be a significant additional cost. We have a setSigma method that uses a sync.Once to compute and store the covariance matrix if it's actually required. This is fine, except there are also operations which only need a single element of the covariance matrix, for example getting the marginal distribution along the single dimension. If the covariance matrix has already been computed (due to some other method call), then it's an O(1) operation to return the marginal. If it has not been computed, then a single element of the covariance matrix can be computed in O(n) time, instead of forcing the entire (n^3) operation of the covariance matrix.

At the moment, we have the code

func (n *Normal) MarginalNormalSingle(i int, src *rand.Rand) distuv.Normal {
	var std float64
	if n.sigma != nil {
		std = n.sigma.At(i, i)
	} else {
		// Reconstruct the {i,i} diagonal element of the covariance directly.
		for j := 0; j <= i; j++ {
			v := n.lower.At(i, j)
			std += v * v
		}
	}
	return distuv.Normal{
		Mu:     n.mu[i],
		Sigma:  math.Sqrt(std),
		Source: src,
	}
}

// setSigma computes and stores the covariance matrix of the distribution.
func (n *Normal) setSigma() {
	n.once.Do(func() {
		n.sigma = mat64.NewSymDense(n.Dim(), nil)
		n.sigma.FromCholesky(&n.chol)
	})
}

These two methods cannot be called concurrently, because checking that n.sigma == nil directly races against n.Once setting n.sigma. It is possible to work around this using a mutex, with something like

func (n *Normal) sigmaNil() bool {
    n.RLock()
    b := n.sigma == nil
    n.RUnlock()
    return b
}

and a similar transformation to setSigma, but it would be really nice to just have the MarginalNormal code be

func (n *Normal) MarginalNormalSingle(i int, src *rand.Rand) distuv.Normal {
   var std float64
   if n.once.HasRun() {
   	std = n.sigma.At(i, i)
   } else {
   	// Reconstruct the {i,i} diagonal element of the covariance directly.
               std = ...
   }
}

btracey commented 9 years ago

Okay.

Answer 1 · 2017-02-22T23:47:17.000Z

This seems ripe for misuse. And even in your example code it looks like you're doing it wrong.

Your code permits two goroutines to both see HasRun() return false and then do duplicate initialization work.

You should instead do:

func (n *Normal) MarginalNormalSingle(i int, src *rand.Rand) distuv.Normal {
     n.once.Do(n.initStd)
     std := n.std
     ....
}

func (n *Normal) initStd() {
    ....
}

Answer 2 · 2017-02-22T23:51:31.000Z

I don't understand. Your initStd (which is our setSigma), computes the entire covariance matrix, which computes O(n^2) matrix elements and takes O(n^3) time. If the covariance matrix is not set, instead MarginalNormalSingle computes one matrix element in O(n) time. That is a significant cost savings. Yes, it's possible that two different goroutines could compute the same marginal, which would duplicate work, but it takes O(n^2) goroutines to compute the same marginal before you lose overall.

Answer 3 · 2017-02-22T23:55:21.000Z

Sorry, I didn't read all your code. I just responded looking at your final snippet.

You can work around this without any new API. Just use atomic.StoreInt32 in your n.once.Do(func() { and LoadInt32 to check whether it's done.

I don't think we want to encourage such racy checks. We also didn't accept Mutex.TryLock in #6123.

Answer 4 · 2017-02-22T23:58:10.000Z

You can work around this without any new API. Just use atomic.StoreInt32 in your n.once.Do(func() { and LoadInt32 to check whether it's done.

By which you mean replicating the code that's in sync.Once?

Answer 5 · 2017-02-23T00:02:03.000Z

Agreeing with @bradfitz . This should not go in.

sync.Once is here exactly to handle concurrent invocations correctly. Every time you provide a simple predicate that doesn't also atomically handle an action depending on the predicate, you are producing racy code in a concurrent environment.

If you still wanted that predicate, you can do exactly as Brad suggested: simply use an atomic store to set the predicate in the function you provide to sync.Once.

Answer 6 · 2017-02-23T17:17:32.000Z

Sorry, I thought I understood yesterday, but I don't think I do anymore.

@bradfitz says that "my code permits two goroutines to see HasRun() as false and do duplicate initialization work". This is not true. sync.Once ensures the initialization work only happens once. In my case here, the code in sync.Once is expensive, but does not necessarily need to happen. If it has happened, then the code uses that work as a shortcut, but if it has not happened it can avoid the work by avoiding the data modified by sync.Once entirely.

@griesemer : Isn't having sync.HasRun exactly atomically handling an action depending on the predicate? If the predicate has run, then it does one way, and if it hasn't, it avoids that memory altogether. HasRun is (proposed as) atomic, so it's not a race condition in the "should set off the race detector" sense of the word.

I'm not necessarily trying to change the argument, but you both seem to be suggesting that the above code is bad. I don't understand why. It's not a race condition if the check happens atomically, and it increases the efficiency (in big O) for almost all executions of the code.

Answer 7 · 2017-02-23T17:36:51.000Z

@btracey Think about the code you'd write using the HasRun predicate: Even if HasRun returns false, by the time the caller checks the result of HasRun (which is still false), the code may actually have run (and if you'd call HasRun again it would return true) - hence the race condition.

Answer 8 · 2017-02-23T17:53:36.000Z

Right, I understand that. This is why there needs to be protection with an atomic or mutex.

The code that uses the answer returns the same number in either condition, it just uses a different means to acquire the number. On the one hand, I can see the argument that this kind of unpredictability is bad, but on the other using sync.Once in the first place doesn't seem all that different -- it's impossible to tell if a specific goroutine will get to use the fast path or have to wait for the call to f() to conclude.

If the argument is just "this is too dangerous a tool to expose", then understood. It seemed like there was an additional "and changing your code to use the atomics is a bad idea" and I'm trying to understand why.

Answer 9 · 2017-02-23T21:06:33.000Z

@btracey It's a dangerous tool to expose exactly because it can't be used correctly without yet another synchronization mechanism. So why not use just one mechanism instead?

Answer 10 · 2017-02-23T21:24:42.000Z

Well, with HasRun exposed it would be the same synchronization mechanism (the done field in Once), while without the method another (or altogether different) synchronization mechanism is required. However, I understand your point that HasRun gives the impression the call either has or hasn't run, while in reality all that can be said is that the call either has definitely run or it has maybe run.