openvstorage/alba

list-all-osds lists OSDs previously purged

Opened this issue · 7 comments

This issue occurs when having multiple ALBA Backends

In [41]: %cpaste
Pasting code; enter '--' alone on the line to stop or use Ctrl-D.
:from ovs.extensions.plugins.albacli import AlbaCLI
:
:config1 = 'arakoon://config/ovs/arakoon/bend1-abm/config?ini=%2Fopt%2FOpenvStorage%2Fconfig%2Farakoon_cacc.ini'
:config2 = 'arakoon://config/ovs/arakoon/bend2-abm/config?ini=%2Fopt%2FOpenvStorage%2Fconfig%2Farakoon_cacc.ini'
:asds1 = sorted(AlbaCLI.run(command='list-all-osds', config=config1, named_params={'node-id': 'CSRhmZt549qcKXwdW3PvfEcNlFsvr8aU'}), key=lambda k: k['long_id'])
:asds2 = sorted(AlbaCLI.run(command='list-all-osds', config=config2, named_params={'node-id': 'CSRhmZt549qcKXwdW3PvfEcNlFsvr8aU'}), key=lambda k: k['long_id'])
:
:for asd in asds1:
:    print asd['long_id'], asd['decommissioned'], asd['alba_id']
:print ''
:for asd in asds2:
:    print asd['long_id'], asd['decommissioned'], asd['alba_id']
:--
GZPEeAG8pf8PaFNCiXFetYJGBCEDcboq False 47b887a4-770c-47c4-8027-44090c3c9098
Gr59Ibls3QClaCcHh6YgPAt85y3Ovt0O False None
KoL16B186t2U6Est7IZ8zQHhgon84Gii False None
L5olADCC2ZQLxXDxJCB9LYSlWSZb1Ygf False 47b887a4-770c-47c4-8027-44090c3c9098
kIGZw1aOe8Az9BP7BIOOa0sbp0ki2Kpj False eef292ff-83e5-47ae-bc5a-5f017e290731
ynQAKppbeDCM9puOwtIxOIFXgT1kNLU1 False None

GZPEeAG8pf8PaFNCiXFetYJGBCEDcboq False 47b887a4-770c-47c4-8027-44090c3c9098
Gr59Ibls3QClaCcHh6YgPAt85y3Ovt0O False None
KoL16B186t2U6Est7IZ8zQHhgon84Gii False None
L5olADCC2ZQLxXDxJCB9LYSlWSZb1Ygf False 47b887a4-770c-47c4-8027-44090c3c9098
kIGZw1aOe8Az9BP7BIOOa0sbp0ki2Kpj False eef292ff-83e5-47ae-bc5a-5f017e290731
ynQAKppbeDCM9puOwtIxOIFXgT1kNLU1 False None

Above output shows the ASDs listed by backend bend1 and bend2
All ASDs of both nodes are identical and 2 ASD has been claimed by bend1 and 1 ASD has been claimed by bend2

Now when removing an ASD from bend1 (GZPEeAG8pf8PaFNCiXFetYJGBCEDcboq), we see its being reported as decommissioned by bend1, but obvisouly this is not the case for bend2, as seen below

GZPEeAG8pf8PaFNCiXFetYJGBCEDcboq True 47b887a4-770c-47c4-8027-44090c3c9098
Gr59Ibls3QClaCcHh6YgPAt85y3Ovt0O False None
KoL16B186t2U6Est7IZ8zQHhgon84Gii False None
L5olADCC2ZQLxXDxJCB9LYSlWSZb1Ygf False 47b887a4-770c-47c4-8027-44090c3c9098
kIGZw1aOe8Az9BP7BIOOa0sbp0ki2Kpj False eef292ff-83e5-47ae-bc5a-5f017e290731
mWTgD0LuxadmUpYdi2Ues4zj3HZvd0Xl False None
ynQAKppbeDCM9puOwtIxOIFXgT1kNLU1 False None

GZPEeAG8pf8PaFNCiXFetYJGBCEDcboq False 47b887a4-770c-47c4-8027-44090c3c9098
Gr59Ibls3QClaCcHh6YgPAt85y3Ovt0O False None
KoL16B186t2U6Est7IZ8zQHhgon84Gii False None
L5olADCC2ZQLxXDxJCB9LYSlWSZb1Ygf False 47b887a4-770c-47c4-8027-44090c3c9098
kIGZw1aOe8Az9BP7BIOOa0sbp0ki2Kpj False eef292ff-83e5-47ae-bc5a-5f017e290731
mWTgD0LuxadmUpYdi2Ues4zj3HZvd0Xl False None
ynQAKppbeDCM9puOwtIxOIFXgT1kNLU1 False None

After maintenance has completely purged the ASD i get this output now

In [53]: %cpaste
Pasting code; enter '--' alone on the line to stop or use Ctrl-D.
:from ovs.extensions.plugins.albacli import AlbaCLI
:# AlbaCLI.run(command='claim-osd', config='arakoon://config/ovs/arakoon/bend2-abm/config?ini=%2Fopt%2FOpenvStorage%2Fconfig%2Farakoon_cacc.ini', named_params={'long-id': 'Vw3Ku4TKIq6A9ohGIgDIMDjURecUTQdR'})
:
:
:config1 = 'arakoon://config/ovs/arakoon/bend1-abm/config?ini=%2Fopt%2FOpenvStorage%2Fconfig%2Farakoon_cacc.ini'
:config2 = 'arakoon://config/ovs/arakoon/bend2-abm/config?ini=%2Fopt%2FOpenvStorage%2Fconfig%2Farakoon_cacc.ini'
:asds1 = sorted(AlbaCLI.run(command='list-all-osds', config=config1, named_params={'node-id': 'CSRhmZt549qcKXwdW3PvfEcNlFsvr8aU'}), key=lambda k: k['long_id'])
:asds2 = sorted(AlbaCLI.run(command='list-all-osds', config=config2, named_params={'node-id': 'CSRhmZt549qcKXwdW3PvfEcNlFsvr8aU'}), key=lambda k: k['long_id'])
:
:for asd in asds1:
:    print asd['long_id'], asd['decommissioned'], asd['alba_id']
:print ''
:for asd in asds2:
:    print asd['long_id'], asd['decommissioned'], asd['alba_id']
:--
Gr59Ibls3QClaCcHh6YgPAt85y3Ovt0O False None
KoL16B186t2U6Est7IZ8zQHhgon84Gii False None
L5olADCC2ZQLxXDxJCB9LYSlWSZb1Ygf False 47b887a4-770c-47c4-8027-44090c3c9098
kIGZw1aOe8Az9BP7BIOOa0sbp0ki2Kpj False eef292ff-83e5-47ae-bc5a-5f017e290731
mWTgD0LuxadmUpYdi2Ues4zj3HZvd0Xl False None
ynQAKppbeDCM9puOwtIxOIFXgT1kNLU1 False None

GZPEeAG8pf8PaFNCiXFetYJGBCEDcboq False 47b887a4-770c-47c4-8027-44090c3c9098
Gr59Ibls3QClaCcHh6YgPAt85y3Ovt0O False None
KoL16B186t2U6Est7IZ8zQHhgon84Gii False None
L5olADCC2ZQLxXDxJCB9LYSlWSZb1Ygf False 47b887a4-770c-47c4-8027-44090c3c9098
kIGZw1aOe8Az9BP7BIOOa0sbp0ki2Kpj False eef292ff-83e5-47ae-bc5a-5f017e290731
mWTgD0LuxadmUpYdi2Ues4zj3HZvd0Xl False None
ynQAKppbeDCM9puOwtIxOIFXgT1kNLU1 False None

ASD with ID GZPEeAG8pf8PaFNCiXFetYJGBCEDcboq is still being reported by bend2, but no longer by bend1
The more ASDs get claimed and removed between different Backends, the more the list-all-osds output starts differing

domsj commented

This is (to me at least) expected.
If we don't want this behaviour we should either

  • not automatically track (add) all osds in a backend (now asds are discovered automatically and added to all backends/abms)
  • remove asds that are claimed by another backend and
    the framework could when an asd is destroyed remove it from all backends
  • some process could periodically remove all osds from a backend that are not claimed by that particular backend (this could race with claim-osd actions triggered from the gui though)

Probably the most workable/desirable solution is the first one (no more asd discovery).

domsj commented

Regarding my preference for no more asd discovery: that would also prevent issues such as #269 (maintenance from env1 trying to connect to asds from env2 - which it possibly can't even reach due to how the network is configured)

@domsj I believe there is a command to purge OSDs from monitoring, I assume the same can be used to remove it from list-all-osds?

domsj commented

yes, it can be used. the question is: when should it be used? (and the answer determines wether you fall into option 2 or 3 that I mentioned)

@kvanhijf why would you need a 'purified' list-all-osds? What is the context of this ticket. Can the extra command to purge from the monitoring be of any help?

@wimpers : Reason i reported this was just because it seemed not logic to me, to see these differences between ALBA backends. They're not causing any issues for us, but it might be good for OPS to know in which cases they can end up in such behavior

BAM needs to lay its egg. Will we still do auto discovery now that project Golden Gate is coming our way.