kubernetes/cloud-provider-aws

Support Region for DescribeInstance Call

atsai1220 opened this issue · 8 comments

What would you like to be added:

  • I would like to see the cloud-controller's DescribeInstance query also use the instance's region.

Why is this needed:

  • This allows us to create workers in another region (not considering storage needs).

Questions:

  • Are the repercussions for disabling node-life-cycle controller?

Findings:
Currently nodes from another region are "not found" by node-life-cycle controller and will be promptly deleted from the cluster after joining.

On 1.30.1 logs

node_controller.go:240] error syncing 'ip-10-117-161-37.ap-southeast-2.compute.internal': failed to get instance metadata for node ip-10-117-161-37.ap-southeast-2.compute.internal: instance not found, requeuing
node_controller.go:425] Initializing node ip-10-117-161-37.ap-southeast-2.compute.internal with cloud provider
node_controller.go:229] error syncing 'ip-10-117-161-37.ap-southeast-2.compute.internal': failed to get instance metadata for node ip-10-117-161-37.ap-southeast-2.compute.internal: instance not found, requeuing

Configuration

      containers:
        - args:
            - '--v=2'
            - '--cloud-provider=aws'
            - '--configure-cloud-routes=false'
          image: registry.k8s.io/provider-aws/cloud-controller-manager:v1.30.1

/kind feature

This issue is currently awaiting triage.

If cloud-provider-aws contributors determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

I can see from CloudTrail that it was looking for an instance in the wrong region and receiving an error

{
    "eventVersion": "1.09",
    "userIdentity": {
        "type": "AssumedRole",
        "principalId": ":i-07143150147441be0",
        "arn": "arn:aws:sts:::assumed-role//i-07143150147441be0",
        "accountId": "",
        "accessKeyId": "",
        "sessionContext": {
            "sessionIssuer": {
                "type": "Role",
                "principalId": "",
                "arn": "arn:aws:iam:::role/",
                "accountId": "982008609023",
                "userName": ""
            },
            "attributes": {
                "creationDate": "2024-06-03T17:16:33Z",
                "mfaAuthenticated": "false"
            },
            "ec2RoleDelivery": "2.0"
        }
    },
    "eventTime": "2024-06-03T21:58:18Z",
    "eventSource": "ec2.amazonaws.com",
    "eventName": "DescribeInstances",
    "awsRegion": "us-west-2",
    "sourceIPAddress": "",
    "userAgent": "kubernetes/v1.26.13 aws-sdk-go/1.44.116 (go1.20.13; linux; amd64)",
    "errorCode": "Client.InvalidInstanceID.NotFound",
    "errorMessage": "The instance ID 'i-03c4ae677928450eb' does not exist",
    "requestParameters": {
        "instancesSet": {
            "items": [
                {
                    "instanceId": "i-03c4ae677928450eb"
                }
            ]
        },
        "filterSet": {}
    },
    "responseElements": null,
    "requestID": "474a349c-7107-4df1-9e5e-489a1bc9b606",
    "eventID": "b7b55185-9287-42a7-bad1-6fa628fa05a6",
    "readOnly": true,
    "eventType": "AwsApiCall",
    "managementEvent": true,
    "recipientAccountId": "982008609023",
    "eventCategory": "Management",
    "tlsDetails": {
        "tlsVersion": "TLSv1.3",
        "cipherSuite": "TLS_AES_128_GCM_SHA256",
        "clientProvidedHostHeader": "ec2.us-west-2.amazonaws.com"
    }
}

you can set the region using the cloud config https://github.com/kubernetes/cloud-provider-aws/blob/master/pkg/providers/v1/config/config.go#L25 . Can you try using that ?

Setting the region in the config won't allow you to handle instances in multiple regions, but it will allow you to e.g. run the AWS CCM on-prem or in another region.

The AWS CCM assumes that your resources are in a single region in many places. I'm not necessarily opposed to changing this in the future, but it will require changes far beyond the DescribeInstances calls.

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale