mqingyn/tornetcd

eternal_watch throws exception: EtcdEventIndexCleared: The event in requested index is outdated and cleared : the requested history has been cleared

Closed this issue · 2 comments

I have a small 5-node etcd cluster and a set of 3 processes that listen on an etcd file using eternal_watch, while at the same time repeatedly trying to test_and_set the same etcd file every 29 seconds. The file's TTL is set to 30 seconds. I'm using this to implement master election between the 3 processes.

Everything seems to be running fine for about 40 minutes. After that the processes that do not succeed in the test_and_set operation keep receiving the EtcdEventIndexCleared in eternal_watch, with the error:

EtcdEventIndexCleared: The event in requested index is outdated and cleared : the requested history has been cleared

The CPU utilization spikes to 100% on the python processes, and etcd's CPU utilization increases dramatically too.

The exception actually happens in the read method. It seems to be an error caused by etcd, not by tornetcd. I'm using etcd 2.3.6.

I was wondering if you have seen this problem before and how you fixed it.

You can see this first:https://coreos.com/etcd/docs/latest/api.html#waiting-for-a-change ,maybe it could help you.

Watch from cleared event index

If we miss all the 1000 events, we need to recover the current state of the watching key space through a get and then start to watch from the X-Etcd-Index + 1.

For example, we set /other="bar" for 2000 times and try to wait from index 8.

curl 'http://127.0.0.1:2379/v2/keys/foo?wait=true&waitIndex=8'
We get the index is outdated response, since we miss the 1000 events kept in etcd.

{"errorCode":401,"message":"The event in requested index is outdated and cleared","cause":"the requested history has been cleared [1008/8]","index":2007}
To start watch, first we need to fetch the current state of key /foo:

curl 'http://127.0.0.1:2379/v2/keys/foo' -vv
< HTTP/1.1 200 OK
< Content-Type: application/json
< X-Etcd-Cluster-Id: 7e27652122e8b2ae
< X-Etcd-Index: 2007
< X-Raft-Index: 2615
< X-Raft-Term: 2
< Date: Mon, 05 Jan 2015 18:54:43 GMT
< Transfer-Encoding: chunked
<
{"action":"get","node":{"key":"/foo","value":"bar","modifiedIndex":7,"createdIndex":7}}
Unlike watches we use the X-Etcd-Index + 1 of the response as a waitIndex instead of the node’s modifiedIndex + 1 for two reasons:

The X-Etcd-Index is always greater than or equal to the modifiedIndex when getting a key because X-Etcd-Index is the current etcd index, and the modifiedIndex is the index of an event already stored in etcd.
None of the events represented by indexes between modifiedIndex and X-Etcd-Index will be related to the key being fetched.
Using the modifiedIndex + 1 is functionally equivalent for subsequent watches, but since it is smaller than the X-Etcd-Index + 1, we may receive a 401 EventIndexCleared error immediately.

So the first watch after the get should be:

curl 'http://127.0.0.1:2379/v2/keys/foo?wait=true&waitIndex=2008'

Great, thanks for the pointer!

I replaced the eternal_watch with a regular watch in my code, since I don't care about the events happening in between an old value of modifiedIndex and its current value. This seems to fix the issue I was having.