mediocregopher/radix

Reconnecting startup time

Closed this issue · 6 comments

I am aware that a Client actively tries to reconnect to a server that became unreachable. However, the Client will not try to connect to a server that was reachable when the Client was instantiated; instead, it returns an error.

Scenario: I made an application that connects to a set of unique Redis servers simultaneously and updates the value of some keys when some events happen. It is not a problem if a random Redis server of these becomes unreachable (and comes back later on), but it is definitely a problem if a server is never getting (re)connected to if my application happens to be starting up at a time when one of the servers was unreachable.

How could I overcome this situation? I don't want to restart my application periodically to have it connected to missing servers, as I would miss events happening meanwhile.

Should I make a mock Client that instatiates a Client instance repeatedly until it connects?

Thank you in advance.

Hi @endreszabo, if I understand your question the simplest thing would be to try to create your Clients in a for loop at the startup of your application:

var (
    radixClient radix.Client
    err error
)

for {
    if radixClient, err = (PoolConfig{}).New(ctx, "tcp", addr); err == nil {
        break
    }

    log.Printf("Error connecting to redis at %q: %v", addr, err)
    time.Sleep(1 * time.Second)
}

// radixClient will be initialized at this point

Would that work? Or maybe I'm not understanding the question correctly.

Hi Brian and thank you for the quick reply. This is almost good for me, as long as I can apply it to multiple servers and make the reconnecting happen in the background (a goroutine).

My Clients are used in the following structs:

type redisServer struct {
        ctx    context.Context
        server radix.Client
}
type redisServers struct {
        servers []redisServer
}

Where I have funcs for the redisServers that iterate through the servers list. Two example of these funcs:

func (s *redisServers) radixMultiDo(cmd string, args ...string) (*radix.Client, error) {
        for _, ss := range s.servers {
                err := ss.server.Do(ss.ctx, radix.Cmd(nil, cmd, args...))
                if err != nil {
                        return &ss.server, err
                }
        }
        return nil, nil
}

func (s *redisServers) radixMultiCmd(cmd string, args ...string) (map[radix.Client]string, *radix.Client, error) {
        rvMap := make(map[radix.Client]string)
        for _, ss := range s.servers {
                var rv string
                err := ss.server.Do(ss.ctx, radix.Cmd(&rv, cmd, args...))
                rvMap[ss.server] = rv
                if err != nil {
                        return nil, &ss.server, err
                }
        }
        return rvMap, nil, nil
}

The application needs to be able to start up and issue commands to Redis servers available at any given time, and waiting with startup for the unreachable one to get connected is not preferred. In the worst scenario, I'll launch one binary per Redis server and somehow fan-out the events to these.

I think making a Client a pointer to a Client and (re)connecting to the server from a goroutine on startup would work. Once connected, the goroutine can end. Later on, I can easily check if a client is defined/initialized.

I think making a Client a pointer to a Client and (re)connecting to the server from a goroutine on startup would work. Once connected, the goroutine can end. Later on, I can easily check if a client is defined/initialized.

I just quicky dismiss this as Client is an interface.

I think making a Client a pointer to a Client and (re)connecting to the server from a goroutine on startup would work. Once connected, the goroutine can end. Later on, I can easily check if a client is defined/initialized.

That is the essential idea, but it's a bit tricky because you want to be sure that calls to Do block until the goroutine has successfully created a connection. To do this you can have the goroutine close a channel once it has created the connection.

I went ahead and put together a proof-of-concept of how I would go about this, in such a way that you don't really have to change any of your existing code:

type dialRetryClient struct {
	c           radix.Client
	connectedCh chan struct{}
	closedCh    chan struct{}
}

// NewDialRetryClient returns a Client which will continually call the given
// dialFn until it returns successfully. Until then all calls to Do will block.
//
// Example:
//
//	client := NewDialRetryClient(func(ctx context.Context) (radix.Client, error) {
//		return (radix.PoolConfig{}).New(ctx, "tcp", "127.0.0.1:6379")
//	})
func NewDialRetryClient(
	dialFn func(ctx context.Context) (radix.Client, error),
) (
	radix.Client, error,
) {
	c := &dialRetryClient{
		connectedCh: make(chan struct{}),
		closedCh:    make(chan struct{}),
	}

	// Create a context which will be canceled when Close is called.
	dialCtx, cancel := context.WithCancel(context.Background())
	go func() {
		<-c.closedCh
		cancel()
	}()

	go func() {
		defer close(c.connectedCh)
		for {
			innerC, err := dialFn(dialCtx)
			if dialCtx.Err() != nil {
				return
			} else if err != nil {
				log.Printf("error connecting to redis: %v", err)
				continue
			}

			c.c = innerC
			return
		}
	}()

	return c, nil
}

func (c *dialRetryClient) Addr() net.Addr {
	select {
	case <-c.connectedCh:
	case <-c.closedCh:
		panic("Addr called on closed client")
	default:
		// There's no way to return an error here. For your use-case it might
		// not matter, if it does you might be able to initialize the
		// dialRetryClient with an addr field you can return from this.
		panic("Addr called before client is connected")
	}

	return c.c.Addr()
}

func (c *dialRetryClient) Do(ctx context.Context, a radix.Action) error {
	select {
	case <-ctx.Done():
		return ctx.Err()
	case <-c.connectedCh:
	case <-c.closedCh:
		return errors.New("client is closed")
	}

	return c.c.Do(ctx, a)
}

func (c *dialRetryClient) Close() error {
	close(c.closedCh)
	<-c.connectedCh
	if c.c != nil {
		return c.c.Close()
	}
	return nil
}

You can use NewDialRetryClient to get a custom Client which will wrap any other Client, such that Do will block until dialing succeed. It also correctly handles the Close case, making sure the dialing go-routine is shutdown regardless of if it ever succeeded.

(I haven't actually tested this code yet, but it looks right to me 😅 )

Thanks for your extensive help, Brian. I will close this for now.