Cray-HPE/docs-csm

An observation on Clean_Up_After_a_BOS-BOA_Job_is_Completed_or_Cancelled.md

Closed this issue · 5 comments

Not so much an "issue" with, as an observation prompting a question on. the text in the file

operations/boot_orchestration/Clean_Up_After_a_BOS-BOA_Job_is_Completed_or_Cancelled.md

We read (my formatting):

ConfigMap for BOA:

This ConfigMap contains the configuration information that the BOA job uses.
The BOA pod mounts a ConfigMap named boot-session at /mnt/boot_session
inside the pod.
This ConfigMap has a random UUID name, such as
e786def5-37a6-40db-b36b-6b67ebe174ee.
This name does not obviously connect it to the BOA job.

however, I am currently im the process of cleaning up four day's
worth of BOS Sessions, created whilst trying to solve an issue
that eventaully appeared to solve itself, and I am constantly seeing
that the ConfigMap's UUID doesn't appear to be random, but in
fact appears to be the same as the BOS Session ID, for example:

# export BOA_JOB_NAME=boa-7a98ae9d-512b-4623-8a90-4d3c6426e5fd
#
# export BOS_SESSION_ID=${BOA_JOB_NAME#boa-}
# echo  $BOS_SESSION_ID
7a98ae9d-512b-4623-8a90-4d3c6426e5fd
#
# kubectl -n services describe job ${BOA_JOB_NAME} | \
  grep -A3 boot-session:
   boot-session:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      7a98ae9d-512b-4623-8a90-4d3c6426e5fd
    Optional:  false
#

so, have I just been "lucky", or has the underlying process been
altered so that the BOS Session ID is now propagated into the
ConfigMap UUID for the BOA ?

In which case the documentation doesn't relect that change.

Pinging @Cray-HPE/cms-core-bos.

@pawsey-kbuckley , @jsollom-hpe has created a JIRA to track this internally.

That's accurate; there is a 1-1 internal mapping for the created BOS session and the paired config map. This is more of a helpful coincidence more than anything else, but that is how the behavior is coded within the BOS server v1 code.

With BOS v2, there is no intermediate scheduling or datastructure tucked away into k8s, so these issues become less important.

I will update the bos v1 section of the readme.md document to indicate this.

This issue has not had activity in over 20 days and is being marked as stale.