test: TestIssue2746
xiang90 opened this issue · 6 comments
=== RUN TestIssue2746
--- FAIL: TestIssue2746 (1.67s)
cluster_test.go:360: #1: watch on http://127.0.0.1:20114 error: client: etcd cluster is unavailable or misconfigured
Not able to reproduce... Will try more...
Still reproducible (less than 1%) with the latest version (d32113a) on my machine (Xeon E3, 4 cores)
Can you type assert that error to client.ClusterError and print out its detail? (https://github.com/coreos/etcd/blob/master/client/cluster_error.go#L19-L33)
I got this ClusterError
.
--- FAIL: TestIssue2746 (6.36s)
cluster_test.go:351: create on http://127.0.0.1:20950 error: client: etcd cluster is unavailable or misconfigured(detail: error #0: read tcp 127.0.0.1:49676->127.0.0.1:20950: i/o timeout
Note that this error is raised from a slightly different point than a original point.
diff --git a/integration/cluster_test.go b/integration/cluster_test.go
index 4d7e9e0..c1be43d 100644
--- a/integration/cluster_test.go
+++ b/integration/cluster_test.go
@@ -347,7 +347,8 @@ func clusterMustProgress(t *testing.T, membs []*member) {
key := fmt.Sprintf("foo%d", rand.Int())
resp, err := kapi.Create(ctx, "/"+key, "bar")
if err != nil {
- t.Fatalf("create on %s error: %v", membs[0].URL(), err)
+ cerr := err.(*client.ClusterError)
+ t.Fatalf("create on %s error: %v(detail: %s)", membs[0].URL(), err, cerr.Detail())
}
cancel()
@@ -357,7 +358,9 @@ func clusterMustProgress(t *testing.T, membs []*member) {
mkapi := client.NewKeysAPI(mcc)
mctx, mcancel := context.WithTimeout(context.Background(), requestTimeout)
if _, err := mkapi.Watcher(key, &client.WatcherOptions{AfterIndex: resp.Node.ModifiedIndex - 1}).Next(mctx); err != nil {
- t.Fatalf("#%d: watch on %s error: %v", i, u, err)
+ cerr := err.(*client.ClusterError)
+ t.Fatalf("#%d: watch on %s error: %v(detail: %s)", i, u, err, cerr.Detail())
+
}
mcancel()
}
@heyitsanthony Can you take this over? I cannot reproduce this on my local machine :(. Thanks!
ETCD_ELECTION_TIMEOUT_TICKS wasn't set in semaphore like travis so it was triggering a new election which was causing the lost leader to drop messages. I tried to repro with the election ticks set to 600 and it seemed to work OK. Updated semaphore and marking this as closed.