openvstorage/alba

get-disk-safety failed with 'Namespace manager exception: Nsm_model.Err.Namespace_id_not_found'.

Closed this issue · 4 comments

Problem description

Monitoring revealed the following sub awesome behavior with the get disk safety command:

CRIT - EXCEPTION HC000 - Could not fetch alba information for backend nvmebackend Message: Command 'get-disk-safety' failed with 'Namespace manager exception: Nsm_model.Err.Namespace_id_not_found'.

What could have happened:

  • Another healtcheck was busy with a test that involves creating and removing namespaces
  • At the time the other namespace was getting deleted, get-disk-safety was called

Proposed solution

The whole command should not fail when one namespace cannot be fetched. Perhaps return the current output you have collected and add an exception section or something?

domsj commented

Which alba version?
I think this is something that should be fixed in 1.3.7, see #633 .

The alba version is 1.3.7

root@ovs05:~# alba version
1.3.7
git_revision: "tags/1.3.7-0-gfb75d47"
git_repo: "Not available"
compile_time: "09/03/2017 12:41:35 UTC"
machine: "51ce1efbe55d 4.4.0-36-generic x86_64 x86_64 x86_64 GNU/Linux"
model_name: "Intel Xeon CPU E31220 @ 3.10GHz"
compiler_version: "4.03.0"
dependencies:

On OVH we forgot to restart the proxies after updating to 1.3.7.
Closing the tickets and if i still see this issue i will open the tickets again.

Problem is not solved. This night we received an urgent with the same error.
Alba version 1.3.8