openshift/machine-config-operator

MCD fails with empty string in source

Xaenalt opened this issue · 2 comments

Description

While attempting a routine update, an indentation error caused the MCD to continually fail, resulting in a similar degraded-and-unable-to-recover state as in #1443

The field in question ended up as an empty string:

       - contents:
            source: ''
          mode: 420
          overwrite: true
          path: /path/on/the/system
        - contents:
            source: ''
          mode: 420
          overwrite: true
          path: /path/on/the/system

The MCD hit Line 208 in

const (
dataPrefix = "data:"
mediaSep = '/'
paramSemicolon = ';'
paramEqual = '='
dataComma = ','
)
// start lexing by detecting data prefix
func lexBeforeDataPrefix(l *lexer) stateFn {
if strings.HasPrefix(l.input[l.pos:], dataPrefix) {
return lexDataPrefix
}
return l.errorf("missing data prefix")
}
which caused it to crash loop with missing data prefix errors, go into the degraded state, and be unable to recover

The only way we found to be able to recover was to edit the rendered config the MCO was attempting to apply (fixing the error in the base MachineConfig was ineffective, since it would instead queue a new rendered MachineConfig to apply)

Steps to reproduce the issue:

  1. Create any MachineConfig object with the source field as an empty string
  2. MCO will render and attempt to apply the config
  3. MCD will crash loop

Describe the results you received:
MCD unable to apply the config, crashes with missing data prefix and degraded state. Requires non-intuitive manual intervention

Describe the results you expected:
It would be great at the very least to get a more descriptive error message, but ideally it would be better for the MCO to reject it outright, or for empty string to resolve to 'data:,' instead to avoid this error, ideally warning the user

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

/lifecycle frozen