kubernetes/kops

Migrate to aws-sdk-go-v2

rifelpet opened this issue · 4 comments

/kind cleanup

https://aws.amazon.com/blogs/developer/announcing-end-of-support-for-aws-sdk-for-go-v1-on-july-31-2025/

We'll need to migrate to github.com/aws/aws-sdk-go-v2 before v1 enters maintenance mode

Grouped into packages that likely need to be updated together

  • EC2 cloudmock, pkg/resources/aws, awsup, awstasks, pkg/apis/kops/validation, pkg/model, cloudup/new_cluster.go, template_functions.go, testutils
  • Autoscaling cloudmock, pkg/resources/aws, awsup, awstasks, pkg/instancegroups, pkg/model
  • ELB cloudmock, pkg/resources/aws, awsup, awstasks
  • ELBv2 cloudmock, pkg/resources/aws, awsup, awstasks, testutils
  • Eventbridge cloudmock, pkg/resources/aws, awsup, awstasks, pkg/model
  • IAM cloudmock, pkg/resources/aws, awsup, awstasks, pkg/model, testutils
  • SQS cloudmock, pkg/resources/aws, awsup, awstasks
  • Route53 cloudmock, pkg/resources/aws, awsup, awstasks, dnscontroller, testutils
  • SSM awsup
  • EC2 pkg/nodeidentity
  • EC2 protokube
  • EC2 nodeup
  • EC2 cmd/kops
  • STS nodeup, kops-controller
  • pkg/zones

Two design questions I'm soliciting input on:

  1. Most integer types in the SDK have changed from *int64 to *int32. Kops API has *int64 fields that are passed directly through the model and tasks into the SDK types:

    type WarmPoolSpec struct {
    // MinSize is the minimum size of the pool
    MinSize int64 `json:"minSize,omitempty"`
    // MaxSize is the maximum size of the warm pool. The desired size of the instance group
    // is subtracted from this number to determine the desired size of the warm pool
    // (unless the resulting number is smaller than MinSize).
    // The default is the instance group's MaxSize.
    MaxSize *int64 `json:"maxSize,omitempty"`

    As-is we have to convert from *int64 to *int32:
    warmPoolTask.MaxSize = fi.PtrTo(int32(aws.ToInt64(warmPool.MaxSize)))

    Should we update the Kops API to use *int32 ? I realize some of them are used by other cloud providers, but at least some of them are AWS-only.

  2. All of the SDK's string enums now have enum-typed fields rather than more generic *string fields. Should we update the awstasks types to use the enum types and pass them in from the model? Or have tasks continue to use string and do the enum casting in their Find and Render methods. For example, the instance type:
    https://pkg.go.dev/github.com/aws/aws-sdk-go-v2/service/ec2/types#InstanceType

    https://github.com/aws/aws-sdk-go-v2/blob/0fde27cdffe0657695258e5d5220f7487117e71d/service/ec2/types/enums.go#L2959-L2968

cc @hakman @justinsb

  1. I don't think there's much value in keeping the int64 for the AWS-only ones.
  2. I like the idea of passing types from the model. I think we tried that in other clouds or maybe even in some AWS models.

After #16460 theres only a few remaining uses of the v1 SDK:

  1. Getting a list of known zones and regions:

    kops/pkg/zones/wellknown.go

    Lines 247 to 254 in 1c24423

    func WellKnownZonesForCloud(matchCloud kops.CloudProviderID, prefix string) []string {
    ctx := context.Background()
    var found []string
    switch matchCloud {
    case kops.CloudProviderAWS:
    prefix = strings.ToLower(prefix)
    for _, partition := range endpoints.DefaultResolver().(endpoints.EnumPartitions).Partitions() {

    The v2 SDK no longer maintains a static list of zones and regions, it uses regex to determine partitions and defaults unrecognized regions to the commercial partition in order to be forwards compatible with future regions. If we want to enumerate valid regions or zones we'd need to use the EC2 API which is likely not appropriate here given it is used in shell completion and where the cloud provider may not be known.

  2. The presigned STS GetCallerIdentity request used for node bootstrap via kops-controller. While presigned requests are still supported in V2, the presign methods and types no longer provide access to the request body, only their url and headers. See aws/aws-sdk-go-v2#1137. Kops-controller currently reads the request body to perform some validation:

    requestBytes, _ := io.ReadAll(stsRequest.Body)
    _, _ = stsRequest.Body.Seek(0, io.SeekStart)
    if stsRequest.HTTPRequest.Header.Get("Content-Length") != strconv.Itoa(len(requestBytes)) {
    return nil, fmt.Errorf("incorrect content-length")
    }

    In V1 the presigned request is a POST however in V2 it is converted to a GET request and the normal Action=GetCallerIdentity&Version=2011-06-15 body is moved to URL query parameters:
    https://github.com/aws/aws-sdk-go-v2/blob/bc2a669d3241023e20194cdfe042b8c275887e51/service/sts/api_client.go#L641-L645
    This thread on the original kops-controller PR discusses potential upgrade challenges. We should be able to modify this safely, it will just require some extra care.

Theres also still a few transitive dependencies on the V1 SDK that would need to be upgraded themselves before it will be removed from our vendor directory entirely.

github.com/aws/amazon-ec2-instance-selector/v2@v2.4.2-0.20231216170552-14d4dfcbaadf github.com/aws/aws-sdk-go@v1.46.6
github.com/cert-manager/cert-manager@v1.14.4 github.com/aws/aws-sdk-go@v1.49.13
k8s.io/cloud-provider-aws@v1.29.2 github.com/aws/aws-sdk-go@v1.44.241
  1. We can hardcode it as for the rest. I is best effort anyway for the purpose of cli completion.
  2. I think a PR would help here. Generally speaking, we need to move to the V2 SDK, so best to do it sooner rather than later.