gluster/glusterd2

stale brick process when volume stop operations are done in parallel in brick multiplexing mode

atinmu opened this issue · 8 comments

During stopBricks () , if we happen to send the volume stop requests in parallel, GD2 might fail to determine the last brick in the proc entry because -
Given it tries to match if for the given path if portmapper has only one entry, the sign out even might be processed later and even though it's the last brick to be detached function IsLastBrickInProc () will return > 1.

We should be moving towards a list based approach to store (in memory) all the bricks attached to the parent process and then build up the logic.

@atinmu I am having trouble understanding how a list might help. The portmapper we use currently is kind of an in-memory multi list itself.

I am having trouble understanding how a list might help. The portmapper we use currently is kind of an in-memory multi list itself.

portmapper is an in-memory data structure, however the values in the data structure are (de)populated asynchronously based on the Sign In/Out events. So determining the number of bricks attached a glusterfsd process by looking through the same port can go for a toss depending on how fast/slow the events are received/processed by GD2 on the subsequent brick attach/detach requests. Instead if we maintain a list which gets (de)populated when a brick is attached/detached, there's no asynchronous nature in it which is the issue described here. I hope this clarifies the problem statement and why a need for the separate data structure for the bricks per process data. In case it doesn't, please feel free to ask.

@atinmu @aravindavk I have one idea which doesn't include adding a whole new data-structure. Instead of just having a mapping of port->brickpath->pid, can we add some sort of current state like pending-delete. So as soon as I send a detach request I can update the port mapper to change the state of brickpath to pending-delete and decrease the number of active bricks on a port(or when calculating IsLastBrickInProc() consider bricks with this state as well), when signOut req comes it can match pending-delete state and delete the brick entry.

@atinmu @aravindavk I have one idea which doesn't include adding a whole new data-structure. Instead of just having a mapping of port->brickpath->pid, can we add some sort of current state like pending-delete. So as soon as I send a detach request I can update the port mapper to change the state of brickpath to pending-delete and decrease the number of active bricks on a port(or when calculating IsLastBrickInProc() consider bricks with this state as well), when signOut req comes it can match pending-delete state and delete the brick entry.

How about removing brickpath from portmap registry, signout coming after this can be safely discarded. Port collision will not happen since glusterd2 is not choosing the port.

@aravindavk Yes, I totally agree on this. We just need to think if this could have any negative effect in any use case, once we scale up.

only problem I could see is when glusterd2 restarts, this registry is reset. So if a Volume delete comes after the glusterd2 restart and before all bricks signed in, then the check will say that is the last brick and glusterd2 may terminate the process even though other bricks exists in that process.

That is one issue. Unless we handle signin in GD2 as well.

Taking this out from GCS/1.0 tag considering we're not going to make brick multiplexing a default option in GCS/1.0 release.