willemt/raft

When handling appendentries_response, where do the non-raft fields come from?

pmembrey opened this issue · 7 comments

Hi,

I've got my implementation to the point where the cluster is now stable, in that a leader can be elected and maintain its authority over the cluster. So far so good :)

However now that I'm implementing raft_recv_appendentries_response I've hit something of a snag. There are two fields (highlighted in the header file as being non-raft) that I don't know how to provide:

/* Non-Raft fields follow: */

/* Non-Raft fields follow: */
/* Having the following fields allows us to do less book keeping in
 * regards to full fledged RPC */
/* This is the highest log IDX we've received and appended to our log */
int current_idx;
/* The first idx that we received within the appendentries message */
int first_idx;

What should I be putting in current_idx and first_idx and where should I get it from?

If I leave them as zero (there are no log entries at this stage committed or otherwise as it's a fresh start), it core dumps with:

raftxmpp: src/raft_log.c:97: log_get_from_idx: Assertion `0 <= idx - 1' failed.

Hi Pete

You don't need to populate those two fields. Those fields are populated by the raft_recv_appendentries function

I'll have a look, but I don't think the raft library will result in an operation that results in that assertion firing. It might be caused by the way you're interfacing with the library (this happened to me, when my deserializer wasn't populating the msg_appendentries_response_t fields correctly!)

Also just got this after a period of time running:

raftxmpp: src/raft_server.c:183: raft_recv_appendentries_response: Assertion `0 <= raft_node_get_next_idx(p)' failed.
Aborted (core dumped)

Perfectly willing to believe it's something I'm doing :)

However this is what I'm putting in:

msg_appendentries_response_t  *res = calloc(1,sizeof(msg_appendentries_response_t));
res->success = success; // either 1 or 0 - but the cluster doens't send data so always 1
res->term = term; // Doesn't change often - but in small digits - doesn't die right away
raft_recv_appendentries_response(data->raft_server, other_node, res);

Basically, the whole thing is initialized to zero and if I don't set something in those other fields, it is guaranteed to fail that assert....

I've created a unit test for it here:

https://gist.github.com/pmembrey/f581e503ae303c06098a

Hopefully this will make it obvious what I'm doing wrong :)

I'm running this on Ubuntu...

For the second issue I mentioned, I think it's being caused by a race condition. I'm going to try and replicate it tonight in a unit test. What seems to happen is this:

  • Node A is leader and sends out append entries to Node B
  • Node A receives append entries from Node B with higher term
  • Node A becomes a follower
  • Node A receives an append entries reply
  • Assert fails

Here's the assert again:

raftxmpp: src/raft_server.c:183: raft_recv_appendentries_response: Assertion `0 <= raft_node_get_next_idx(p)' failed.

Re: the raft_recv_appendentries_response issue, you should never hand populate the fields. You should use raft_recv_appendentries for this. That's why it's failing in your unit test. This is how to populate it:

msg_appendentries_response_t res;
int e = raft_recv_appendentries(sv->raft, conn->node_idx, &ae, &res);

I will release the example app tonight, so that should help explain things :)

Re: the 2nd issue, that looks interesting, I will have a look at that.

Re: the `0 <= raft_node_get_next_idx(p)' bug. I was able to reproduce this on ticketd. I've made the fix here: f297457

Can you please check if you're still experiencing the issue?

Cheers