openvstorage/alba

disqualified OSDs cause NoSatisfiablePolicy

Closed this issue · 6 comments

root cause seems to be this: Invalid_argument String.blit which is suspect enough by itself

Aug 22 01:57:44 NY1SRV0006 alba[15075]: 2017-08-22 01:57:44 003903 -0400 - NY1SRV0006 - 15075/0000 - alba/proxy - 4120 - error - Disqualifying osd 0: (Invalid_argument String.blit)
Aug 22 01:58:15 NY1SRV0006 alba[23049]: 2017-08-22 01:58:15 834823 -0400 - NY1SRV0006 - 23049/0000 - alba/proxy - 1304 - info - "(Invalid_argument String.blit)" was unforeseen, invalidating pool
Aug 22 01:58:15 NY1SRV0006 alba[23049]: 2017-08-22 01:58:15 834848 -0400 - NY1SRV0006 - 23049/0000 - alba/proxy - 1305 - info - "(Invalid_argument String.blit)": should_invalidate:true should_retry:false
Aug 22 01:58:15 NY1SRV0006 alba[23049]: 2017-08-22 01:58:15 834905 -0400 - NY1SRV0006 - 23049/0000 - alba/proxy - 1306 - info - "(Invalid_argument String.blit)" was unforeseen, invalidating pool

Basically any unexpected exception coming from a local backend that is used as an OSD in a global backend will cause that local backend to be disqualified.
For example, a master switch of a nsm will show itself as:

alba/proxy - 4146140 - info - a3926642-e2cc-4627-9c02-0d1f880e01fc "Client_helper.MasterLookupResult.Error(0)" was unforeseen, invalidating pool
alba/proxy - 4146141 - info - a3926642-e2cc-4627-9c02-0d1f880e01fc "Client_helper.MasterLookupResult.Error(0)": should_invalidate:true should_retry:false
alba/proxy - 4146142 - error - Disqualifying osd 0 a3926642-e2cc-4627-9c02-0d1f880e01fc : Client_helper.MasterLookupResult.Error(0)

Since the OSD is disqualified, the global backend can get into trouble.
In this particular case, the arakoon had no real issue as the master switch was triggered by drop-master.

@toolslive , is this related to ticket #550 or can we fix this one while at it?

It's probably related. Also, the more namespace managers you have for the local backend, the more likely you are to run into this.

In essence, "disqualified OSDs cause NoSatisfiablePolicy" is not a bug. The bug here is that there
were plenty of simple scenarios which cause the OSD to be disqualified while it shouldn't.
We fixed a number of these cases.

From @toolslive :I would close this one. I'll open another whenever the remaining cases (re)surface

Hence closing down this one.