Support Region for DescribeInstance Call
atsai1220 opened this issue · 8 comments
What would you like to be added:
- I would like to see the cloud-controller's DescribeInstance query also use the instance's region.
Why is this needed:
- This allows us to create workers in another region (not considering storage needs).
Questions:
- Are the repercussions for disabling node-life-cycle controller?
Findings:
Currently nodes from another region are "not found" by node-life-cycle
controller and will be promptly deleted from the cluster after joining.
On 1.30.1 logs
node_controller.go:240] error syncing 'ip-10-117-161-37.ap-southeast-2.compute.internal': failed to get instance metadata for node ip-10-117-161-37.ap-southeast-2.compute.internal: instance not found, requeuing
node_controller.go:425] Initializing node ip-10-117-161-37.ap-southeast-2.compute.internal with cloud provider
node_controller.go:229] error syncing 'ip-10-117-161-37.ap-southeast-2.compute.internal': failed to get instance metadata for node ip-10-117-161-37.ap-southeast-2.compute.internal: instance not found, requeuing
Configuration
containers:
- args:
- '--v=2'
- '--cloud-provider=aws'
- '--configure-cloud-routes=false'
image: registry.k8s.io/provider-aws/cloud-controller-manager:v1.30.1
/kind feature
This issue is currently awaiting triage.
If cloud-provider-aws contributors determine this is a relevant issue, they will accept it by applying the triage/accepted
label and provide further guidance.
The triage/accepted
label can be added by org members by writing /triage accepted
in a comment.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.
I can see from CloudTrail that it was looking for an instance in the wrong region and receiving an error
{
"eventVersion": "1.09",
"userIdentity": {
"type": "AssumedRole",
"principalId": ":i-07143150147441be0",
"arn": "arn:aws:sts:::assumed-role//i-07143150147441be0",
"accountId": "",
"accessKeyId": "",
"sessionContext": {
"sessionIssuer": {
"type": "Role",
"principalId": "",
"arn": "arn:aws:iam:::role/",
"accountId": "982008609023",
"userName": ""
},
"attributes": {
"creationDate": "2024-06-03T17:16:33Z",
"mfaAuthenticated": "false"
},
"ec2RoleDelivery": "2.0"
}
},
"eventTime": "2024-06-03T21:58:18Z",
"eventSource": "ec2.amazonaws.com",
"eventName": "DescribeInstances",
"awsRegion": "us-west-2",
"sourceIPAddress": "",
"userAgent": "kubernetes/v1.26.13 aws-sdk-go/1.44.116 (go1.20.13; linux; amd64)",
"errorCode": "Client.InvalidInstanceID.NotFound",
"errorMessage": "The instance ID 'i-03c4ae677928450eb' does not exist",
"requestParameters": {
"instancesSet": {
"items": [
{
"instanceId": "i-03c4ae677928450eb"
}
]
},
"filterSet": {}
},
"responseElements": null,
"requestID": "474a349c-7107-4df1-9e5e-489a1bc9b606",
"eventID": "b7b55185-9287-42a7-bad1-6fa628fa05a6",
"readOnly": true,
"eventType": "AwsApiCall",
"managementEvent": true,
"recipientAccountId": "982008609023",
"eventCategory": "Management",
"tlsDetails": {
"tlsVersion": "TLSv1.3",
"cipherSuite": "TLS_AES_128_GCM_SHA256",
"clientProvidedHostHeader": "ec2.us-west-2.amazonaws.com"
}
}
you can set the region using the cloud config https://github.com/kubernetes/cloud-provider-aws/blob/master/pkg/providers/v1/config/config.go#L25 . Can you try using that ?
Setting the region in the config won't allow you to handle instances in multiple regions, but it will allow you to e.g. run the AWS CCM on-prem or in another region.
The AWS CCM assumes that your resources are in a single region in many places. I'm not necessarily opposed to changing this in the future, but it will require changes far beyond the DescribeInstances calls.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale