Lesson 102: Autoscalar in 'Not Ready' State for AWS EKS 1.25
Closed this issue · 5 comments
Hi Anton,
I have created the AWS EKS resources based on this and the cluster that gets created has version 1.25.
I'm also able to apply all files from this directory after updating the role ARN, container image to registry.k8s.io/autoscaling/cluster-autoscaler:v1.25.1
, and cluster name
But when I do kubectl get all -A -n kube-system
, I get:
NAMESPACE NAME READY STATUS RESTARTS AGE
default pod/nginx-7f85bb5c99-77kdl 1/1 Running 0 153m
kube-system pod/aws-node-drj5z 1/1 Running 0 21h
kube-system pod/coredns-7975d6fb9b-b9nrw 1/1 Running 0 21h
kube-system pod/coredns-7975d6fb9b-qdk4r 1/1 Running 0 21h
kube-system pod/kube-proxy-86tw4 1/1 Running 0 21h
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default service/kubernetes ClusterIP 172.20.0.1 <none> 443/TCP 21h
default service/private-lb LoadBalancer 1.2.3.4 long-text.elb.us-east-1.amazonaws.com 80:30528/TCP 137m
default service/public-lb LoadBalancer 1.2.3.4 long-text.elb.us-east-1.amazonaws.com 80:32750/TCP 137m
kube-system service/kube-dns ClusterIP 172.20.0.10 <none> 53/UDP,53/TCP 21h
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
kube-system daemonset.apps/aws-node 1 1 1 1 1 <none> 21h
kube-system daemonset.apps/kube-proxy 1 1 1 1 1 <none> 21h
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE
default deployment.apps/nginx 1/1 1 1 153m
kube-system deployment.apps/cluster-autoscaler 0/1 0 0 8m58s
kube-system deployment.apps/coredns 2/2 2 2 21h
NAMESPACE NAME DESIRED CURRENT READY AGE
default replicaset.apps/nginx-7f85bb5c99 1 1 1 153m
kube-system replicaset.apps/cluster-autoscaler-6cf6d855c5 1 0 0 8m58s
kube-system replicaset.apps/coredns-7975d6fb9b 2 2 2 21h
And when I try to tail the log by kubectl -n kube-system logs -f deployment.apps/cluster-autoscaler
, it times out with error: timed out waiting for the condition
.
How can I fix this error?
Additionally, based on the AWS autoscaling webpage I modified this file to:
data "aws_iam_policy_document" "eks_cluster_autoscaler_assume_role_policy" {
statement {
actions = ["sts:AssumeRoleWithWebIdentity"]
effect = "Allow"
condition {
test = "StringEquals"
variable = "${replace(aws_iam_openid_connect_provider.eks.url, "https://", "")}:sub"
values = ["system:serviceaccount:kube-system:cluster-autoscaler"]
}
principals {
identifiers = [aws_iam_openid_connect_provider.eks.arn]
type = "Federated"
}
}
}
resource "aws_iam_role" "eks_cluster_autoscaler" {
assume_role_policy = data.aws_iam_policy_document.eks_cluster_autoscaler_assume_role_policy.json
name = "eks-cluster-autoscaler"
}
resource "aws_iam_policy" "eks_cluster_autoscaler" {
name = "eks-cluster-autoscaler"
policy = jsonencode({
Statement = [
{
Action = [
"autoscaling:DescribeAutoScalingInstances",
"autoscaling:DescribeAutoScalingGroups",
"ec2:DescribeLaunchTemplateVersions",
"autoscaling:DescribeTags",
"autoscaling:DescribeLaunchConfigurations",
"ec2:DescribeInstanceTypes",
]
Effect = "Allow"
Resource = "*"
},
{
Action = [
"autoscaling:SetDesiredCapacity",
"autoscaling:TerminateInstanceInAutoScalingGroup",
]
Effect = "Allow"
Resource = "*"
Condition = {
"StringEquals" = {
"aws:ResourceTag/k8s.io/cluster-autoscaler/${var.cluster_name}": "owned"
}
}
}
]
Version = "2012-10-17"
})
}
resource "aws_iam_role_policy_attachment" "eks_cluster_autoscaler_attach" {
role = aws_iam_role.eks_cluster_autoscaler.name
policy_arn = aws_iam_policy.eks_cluster_autoscaler.arn
}
output "eks_cluster_autoscaler_arn" {
value = aws_iam_role.eks_cluster_autoscaler.arn
}
But still seeing the same error.
Figured out the fix!
Performed a clean run with the platform and deployments working correctly.
This is my pull request: #155
Thanks for the PR, it's slightly outdated (my code)
Merged
Thank you!