Implement proper UpdateStrategy for Microservices

Question

Implement proper UpdateStrategy for Microservices

Opened this issue 6 years ago · 1 comments

Rolling Updates

Currently, we're not doing rolling updates nicely. We cut off as soon as a new
release is available in the Microservice, but that doesn't mean it's actually
available yet (boot times etc.)

We should take several steps to make this a proper rolling update.

UpdateStrategy

We'll want to move the UpdateStrategy from the NetworkPolicy to the
Microservice. The main motivator for this is that we will eventually also have
non HTTP applications running through Heighliner like background workers and
temporary jobs.

From an availability point of view, the Microservice is the best point to
determine which release should actually be marked as the active release. In an
HTTP scenarion, this means that this is the release to which the traffic should
lead to.

Minimum Available

The UpdateStrategy should have a MinimumAvailable field, allowing the user to
specify how many versions they want to keep alive. This includes the active
version.

For rolling upgrades, if the version is set to 1, we will not delete the older
active version until the new version is in a state where we're happy with
availability.

The default value will be 3.

Strategy: latest

The latest strategy is fairly simple and doesn't currently take any arguments.
It will look at available images for the ReleaseGroup and deploy a new version
if one is available in the ImagePolicy. Once this new version is deployed and
available to serve requests, it will automatically be marked as active (see
more on statuses here). Other versions will be cycled accordingly.

Strategy: manual

manual takes priority over latest. Both can be set, but when manual is set
the cluster will choose this as the desired version to be marked as active
(see more on statuses here).

Microservice Changes

To reflect the UpdateStrategy, I propose we add the following to the
Microservice.Spec:

// MicroserviceSpec represents the specification for a Microservice. It houses
// all the policies which we'll use to build a VersionedMicroservice.
type MicroserviceSpec struct {
	// …

	// UpdateStrategy represents the way this Microservice will roll over from
	// one version to another.
	UpdateStrategy UpdateStrategy `json:"updateStrategy"`
}

We will keep using the existing UpdateStrategy:

type UpdateStrategy struct {
	Manual *ManualUpdateStrategy `json:"manual"`
	Latest *LatestUpdateStrategy `json:"latest"`
}

type ManualUpdateStrategy struct {
	SemVer *SemVerRelease `json:"semVer"`
}

type LatestUpdateStrategy struct{}

We also want to update the Microservice.Status data format. We want to
introduce the concept of ReleaseGroup. We internally already do this at the
NetworkPolicy level, but we should do this a level higher up so it's easily
visible to the rest of the cluster.

// MicroserviceStatus represents the status a specific Microservice is in.
type MicroserviceStatus struct {
	ReleaseGroups []ReleaseGroup `json:"releaseGroups"`
}

ReleaseGroup

A ReleaseGroup is a grouping of releases that fall within a specific release
level. For example, when a cluster is configured to monitor Preview releases,
a specific release group would be a single PR. We can have multiple PRs, each
PR would be it's own ReleaseGroup.

When we choose to use a ReleaseMinor versioning level, a release group could
be 1.2.x or it could be 1.3.x. All the versions within the x range would
then fall within that specific group.

// ReleaseGroup represents a group of releases that are tied together through
// their versioning configuration. A ReleaseGroup could be represented by a
// single Pull Request like `add-feature-x`, or it could be represented by a
// specific tag version, like `1.2.x`.
type ReleaseGroup struct {
	// Name represents the name fo the release group. For PRs this will be
	// represented by the Pull Request name. For (Pre)Releases this will be
	// represented by the filtered tag.
	Name string `json:"name"`

	// Releases represents the releases that are available for the specific
	// ReleaseGroup.
	Releases []Release `json:"release"`
}

Release

The Release itself will reflect a minor change as well. We'll add a Status
field to the release. We can have several Statuses:

Active: the main active release, the release we actually want to use
Canary: a temporary deployment which is active, but doesn't receive the main trafic
Error: a temporary deployment which isn't active for some reason
Deprecated: a release that will be removed shortly

type ReleaseStatus string

const (
	// ReleaseStatusActive refers to a release which receives the application's
	// main traffic.
	ReleaseStatusActive ReleaseStatus = "active"

	// ReleaseStatusCanary refers to a release which is healthy and receives the
	// application's test traffic.
	ReleaseStatusCanary ReleaseStatus = "canary"

	// ReleaseStatusError refers to a release which encountered an error whilst
	// trying to deploy.
	ReleaseStatusError ReleaseStatus = "error"

	// ReleaseStatusDeprecated referes to a release which will be removed
	// shortly.
	ReleaseStatusDreprecated ReleaseStatus = "deprecated"
)

// Release represents a specific release for a version of an image.
type Release struct {
	// …

	// Status represents the current status of this specific Release.
	Status ReleaseStatus `json:"status"`
}

NetworkPolicy Changes

In the NetworkPolicy.Spec, we'd remove the UpdateStrategy field and always
rely on the active release for a ReleaseGroup in the Microservice.Status.

We'll do what we do now and use the labels of Microservice Release where the
Status is set to Active. We'll then update the main Service to point to these
labels instead.

By doing this, we should achieve 0 downtime as the Microservice controller
only marks the Release Active once it's happy with it's availability.

The NetworkPolicy will now be simpler and just iterate over the ReleaseGroups.
We can now introduce a new Template Variable to the templating system for the
Domains as well, ReleaseGroupName.

Answer 1 · 2018-07-16T14:12:44.000Z

+1 on ReleaseGroup. I'd prefer the name ReleaseStream though. This is something captured in the stream stuff for PR support, but to handle major/minor/patch etc upgrades, having a proper concept for it would be very useful.

-1 on moving this control from networkpolicy to microservice. having control over which bits serve traffic, and allowing multiple domains to point to the same vsvc is a big differentiator for hlnr. is it actually a useful one? I dunno, but I like to think so :)

The microservice should handle garbage collection of vsvcs based on a configured number of live instances, the release streams in play, and the vsvcs that have network policies pointed at them. the networkpolicy should handle updating to a new vsvc, if configured to do so, based on vsvc readiness.

To support canarying and rolling deploys along with blue/green, a networkpolicy that updates can inform the vsvcs in play to adjust themselves.

For microservices that don't have an http component... I dunno. I think we can worry about that afterwards. maybe it looks like what you've outlined, or maybe its a new api object.