GoogleCloudPlatform/berglas

Berglas interacts badly with tools that rely on process wrapping like Argo-workflows

RmStorm opened this issue · 3 comments

Argo-workflows is a widely used job/pipeline tool for kubernetes. It's controller wraps a command in another process that handles cancelation and stuff. Sadly when that process is wrapped by Berglas, and Berglas hangs for whatever reason the whole thing gets stuck. We are using Berglas with a mutatingwebhook in Kubernetes according to the setup in this example and it is these lines that cause the problem. We have resolved the issue by replacing those lines with this snippet:

	if len(c.Command) == 1 && c.Command[0] == "/var/run/argo/argoexec" && len(c.Args) >= 2 && c.Args[0] == "emissary" && c.Args[1] == "--" {
		argPrefix := []string{"emissary", "--", binVolumeMountPath + "berglas", "exec", "--"}
		moreArgs := c.Args[2:]
		m.logger.Infof("Processing command %v %v as Argo", c.Command, c.Args)
		c.Args = append(argPrefix, moreArgs...)
		m.logger.Infof("Processed command as Argo: resulting commandline", c.Args)
	} else {
		m.logger.Infof("Processing command %v %v as non-Argo", c.Command, c.Args)
		// Prepend the command with berglas exec --
		original := append(c.Command, c.Args...)
		c.Command = []string{binVolumeMountPath + "berglas"}
		c.Args = append([]string{"exec", "--"}, original...)
	}

This solves our problem but it is a very narrow solution. Does anyone have better ideas for how to resolve this issue in a more structural way?

It sounds like argo-workflows and Berglas do the same thing - update the start command to be their binary and then fake exec the subprocess. Berglas expects the child process to exit.

This issue is stale because it has been open for 14 days with no
activity. It will automatically close after 7 more days of inactivity.

This issue is stale because it has been open for 14 days with no
activity. It will automatically close after 7 more days of inactivity.