mediocregopher/radix

Read timeout not working with v3

jarpit96 opened this issue · 2 comments

Hi, I have following radix setup in my project

customConnFunc := func(network, addr string) (radix.Conn, error) {
		c, err := radix.Dial(network, addr,
			radix.DialConnectTimeout(5*time.Second),
			radix.DialReadTimeout(200*time.Microsecond),
			radix.DialWriteTimeout(200*time.Microsecond),
		)
poolFunc := func(network, addr string) (radix.Client, error) {
			return radix.NewPool(network, addr, config.PoolSize, radix.PoolConnFunc(customConnFunc))
		}
redixCluster, err = radix.NewCluster(config.RedisClusterNodes, radix.ClusterPoolFunc(poolFunc), radix.ClusterSyncEvery(time.Duration(config.ClusterSyncInSec)*time.Second))

The read timeout is not working as expected.
If I set a timeout of say 100 microseconds, it still takes more time than that while processing.

Code used for timing -

start := time.Now()
err := redis.RedixInstance().Do(radix.Cmd(&b, "MGET", s...))
if err != nil {
       fmt.Println(time.Now().Sub(start).String())
       fmt.Println(err)
}

Output -

//Not timing out in most cases
553.798µs

//Timing out in some cases, but still time taken is greater than configured timeout (200 microseconds)
277.94µs
read tcp 127.0.0.1:56882->127.0.0.1:7006: i/o timeout

The read timeout applies only to time spent reading from the network connection.

Your example uses a mostly default pool configuration which means that it's using implicit pipelining which by default causes commands to be delayed by up to 150us.

So assuming the worst case delay of 150us from implicit pipelining and a read timeout of 100us your command will take at least 250us before you get the error.

On the other hand if your command needs multiple read from the connection, each read is subject to the 100us read timeout. So the one read could take take 90us and another one 80us and you wouldn't get a timeout since no single read took over 100us.

But if the first read takes 90us and the second 110us, the second would time out after 100us and your command would return after 90+100=190us which is again more than 100us.

And that is ignoring any extra latency caused by implicit pipelining or the code itself (which also takes time, e.g. for marshalling, unmarshalling and unmarshalling, including reflection and allocations or just getting a connection from the pool in some cases!).

If you want to know exactly why it is taking so long you can try creating a block profile using pprof or maybe a runtime trace. See here for more information.

Note that radix v4 will add support for context.Context when calling commands so timeouts should be better there, though still not necessarily perfectly precise because of issues like #274

There are also some issues about timer accuracy on the Go issue tracker so I wouldn't be surprised if this is also partly Go's fault.

@jarpit96 please re-open this if you still have any questions about this or are experiencing issues.