hashicorp/raft

cannot take snapshot

nnsgmsone opened this issue · 9 comments

I found that the newly added node will repeatedly create a snapshot. Why?

2019-04-18 10-32-43 的屏幕截图
log is above

First of all, I added the following log in code:

2019-04-18 18-20-50 的屏幕截图
Then I run two nodes, and add a node after processing some data, the following error will occur:
2019-04-18 18-23-47 的屏幕截图
2019-04-18 18-24-16 的屏幕截图
2019-04-18 18-24-43 的屏幕截图
it seems enter an infinite loop. How can I fix this bug?

I found that the cause of this problem is because only logs of type LogCommand will update lastindex:
2019-04-19 10-32-29 的屏幕截图
So the commitindex is updated, but lastindex is not updated.So it will always enter the following process, leading to an infinite loop:
2019-04-19 11-15-59 的屏幕截图

I think you can check the snapReq.index and commitIndex before calling the user callback. Why not?Is there any special consideration?

Hey @nnsgmsone, thank you so much for all this information. Are you able to push up your code into a repository so we can take a deeper look into it? I feel like it would help give us a lot of additional context.
Thank you!

Ok, I will write a simple example and send it to you

@s-christoff the repository is https://github.com/nnsgmsone/raft_test.git。My step is to start two nodes first, then call the client to send the request, wait for the snapshot to be created. then start a new node, and then you will see the situation.

for the record I've seen this as well in our production system.

@james-lawrence how do you deal with this problem?

stale commented

Hey there, This issue has been automatically closed because there hasn't been any activity for a while. If you are still experiencing problems, or still have questions, feel free to open a new one :+1