yorkie-team/yorkie

Intermittent connection refused error in TestSDKRPCServerBackend

hackerwins opened this issue · 2 comments

What happened:

Intermittently, when executing the TestSDKRPCServerBackend test, the client encounters a "Connection Refused" error when attempting to connect to the Server.

What you expected to happen:

The connection between the Client and Server should be established successfully without any errors.

How to reproduce it (as minimally and precisely as possible):

Run the TestSDKRPCServerBackend test.
Monitor the test execution and observe if any intermittent "Connection Refused" errors occur.

https://github.com/yorkie-team/yorkie/actions/runs/7782806743/job/21219978503

Anything else we need to know?:

After merging MongoDB sharding PR, intermittent test failures have been observed. It is suspected that these failures are related to the recent separation of RPC test cases in the MongoDB sharding PR.

#776

Environment:

  • Operating system: N/A
  • Browser and version: N/A
  • Yorkie version (use yorkie version): v0.4.14
  • Yorkie JS SDK version: N/A

I suspect that test has been executed before test server startup.
Ref: #782 (comment)

How to reproduce this issue

Add time.Sleep(10 * time.Second) inside server's listenAndServe() goroutine.

func (s *Server) listenAndServe() error {
	go func() {
		time.Sleep(5 * time.Second)
		logging.DefaultLogger().Infof(fmt.Sprintf("serving RPC on %d", s.conf.Port))
...

How to resolve this issue

Add helper function to wait for server to start using net.DialTimeout() and exponential backoff algorithm like below.

// WaitForServerToStart waits for the server to start.
func WaitForServerToStart(addr string) error {
	maxRetries := 10
	initialDelay := 100 * gotime.Millisecond
	maxDelay := 5 * gotime.Second

	for attempt := 0; attempt < maxRetries; attempt++ {
		// Exponential backoff calculation
		delay := initialDelay * gotime.Duration(1<<uint(attempt))
		fmt.Println("delay: ", delay)
		if delay > maxDelay {
			delay = maxDelay
		}

		conn, err := net.DialTimeout("tcp", addr, 1*gotime.Second)
		if err != nil {
			gotime.Sleep(delay)
			continue
		}

		err = conn.Close()
		if err != nil {
			return err
		}

		return nil
	}

	return fmt.Errorf("failed to connect server via %s", addr)
}