golang/go

time: Timer.Stop documentation example easily leads to deadlocks

palsivertsen opened this issue ยท 13 comments

I needed timeout functionality for one of my projects, so I looked in the time package. My timeouts where fallbacks in case a channel receive took too long. Most of the time the channel would receive before the timeout and I wanted to release the timeout resources when they where no longer needed. Documentation for time.After() says:

[...] If efficiency is a concern, use NewTimer instead and call Timer.Stop if the timer is no longer needed.

So I used a time.Timer and according to the documentation for time.Timer.Stop() one should drain the channel if time.Timer.Stop() returns false:

if !t.Stop() {
	<-t.C
}

I later discovered that my threads got stuck on receive like in this playground example when timer where triggered before I called stop:

t := time.NewTimer(time.Second * 3)
defer func() {
	if !t.Stop() {
		<-t.C
	}
}()
<-t.C

Wrapping the drain in a select seems to do the trick:

t := time.NewTimer(time.Second * 3)
defer func() {
	t.Stop()
	select {
	case <-t.C:
	default:
	}
}()
<-t.C

Documentation should make it clear how to safely drain the channel.

TL/DR: This is incorrect usage and the documentation kinda mentions it but it takes a while to understand it correctly so while it's documented it could be documented better.

For example, assuming the program has not received from t.C already

t := time.NewTimer(time.Second * 3)
defer func() {
	if !t.Stop() {
		<-t.C
	}
}()
<-t.C

Isn't this incorrect usage because you've already received from t.C. Isn't the point of the timer to fire after the delay except if you call Stop on it? <-t.C already waits for the timer to fire so the stop in the defered function is entirely useless because the timer has already fired anyway?

The way it actually works is that Stop() returns false in case the timer has already fired which means UNLESS you haven't ALREADY read from it then there's a value in t.C you might want to read. Obviously this doesn't work if you've already read from t.C. Stop will return false regardless (as the timer has already fired) but you've already read from t.C earlier thus you deadlock on <-t.C.

t := time.NewTimer(time.Second * 3)
defer func() {
	t.Stop()
	select {
	case <-t.C:
	default:
	}
}()
<-t.C

This prevents the deadlock, sure and it's always safe to do that because if Stop() returns true you enter the default case and if it returns false you enter the default case as well because t.C is empty because you've already read it but since you enter the default case anyway in this example you might as well just remove the whole select. Still, this isn't the intended usage of Stop().

FWIW: This would be an example of proper usage:

package main

import "time"
import "fmt"

func main() {
	t := time.NewTimer(time.Second * 3)
	foo := make(chan int)
	go func() { foo <- 1 }()
	select {
	case <-t.C:
		fmt.Println("timeout")
	case <-foo:
		fmt.Println("foo")
		if !t.Stop() {
			<-t.C
		}
	}
}

My timeouts where fallbacks in case a channel receive took too long. Most of the time the channel would receive before the timeout and I wanted to release the timeout resources when they where no longer needed.

I believe you don't really need to drain the timer channel for this, calling Timer.Stop will suffice:

timer := time.NewTimer(3 * time.Second)
defer timer.Stop()
select {
case res := <- workChannel:
    return res, nil
case <-timer.C:
    return nil, ErrTimeout
}

You may find such pattern in use in standard library.

@FMNSSun
Thanks for the explanation. Your example looks somewhat like what I did in the first place. But I had more channels in my select and didn't like all the extra t.Stop() calls:

t := time.NewTimer(time.Second * 3)
bar := make(chan int)
chicken := make(chan int)
egg := make(chan int)
go func() { foo <- 1 }()
select {
case <-t.C:
	fmt.Println("timeout")
case <-foo:
	fmt.Println("foo")
	if !t.Stop() {
		<-t.C
	}
case <-chicken:
	fmt.Println("chicken")
	if !t.Stop() {
		<-t.C
	}
case <-egg:
	fmt.Println("egg")
	if !t.Stop() {
		<-t.C
	}
}

@artyom

I believe you don't really need to drain the timer channel for this, calling Timer.Stop will suffice

Looks scary. What if timer triggers between the select block and the defer? Won't you have a thread stuck on channel send?

@palsivertsen No because timer uses a buffered channel with capacity 1 exactly for this reason: that it can't get stuck if there's nobody reading from it.

Also... it might make sense in that case to move the t.Stop past the select instead of repeating it in every case.

timer uses a buffered channel with capacity 1 exactly for this reason: that it can't get stuck if there's nobody reading from it.

Cool. I did not know that.

Also... it might make sense in that case to move the t.Stop past the select instead of repeating it in every case.

Wouldn't that deadlock if case <-t.C: happens?

@palsivertsen it would but you could set a flag in the case <-t.C case and then only invoke stop if that flag isn't already set. But probably matter of personal taste.

+1 this issue just cost me an hour :( I also have the keep-alive scenario. Perhaps the the docs should link to this discussion?

I don't think we want to link to this discussion.

Does anyone have specific improvements to suggest? Anyone want to send a pull request? Thanks.

Some suggestions/thoughts:

  • Update the doc to underline that ignoring the value of the time.Timer.C channel is safe because the channel has a buffer length of one.
  • Remove channel draining from time.Timer.Stop documentation. Are there any usecases where one would need/want to drain the channel when stopping (not resetting) the timer?
  • Make time.Timer.Reset do the draining internally, thus no need to expose the drain concept in the docs. This sadly changes the behaviour of time.Timer.Reset, possibly breaking existing code. Adding time.Timer.ResetAndDrain might be an alternative.

Change https://golang.org/cl/185245 mentions this issue: time: clarify when draining a Timer's channel is needed

@palsivertsen No because timer uses a buffered channel with capacity 1 exactly for this reason: that it can't get stuck if there's nobody reading from it.

Also... it might make sense in that case to move the t.Stop past the select instead of repeating it in every case.

@FMNSSun do you have a source for this?

rsc commented

The documentation no longer gives this example, because it is no longer necessary.