Kops on a disconnected environment
Opened this issue · 5 comments
/kind bug
1. What kops
version are you running? The command kops version
, will display
this information.
1.26.3
2. What Kubernetes version are you running? kubectl version
will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops
flag.
1.26.4
3. What cloud provider are you using?
AWS
4. What commands did you run? What is the simplest way to reproduce this issue?
Manage your own security group and allow egress traffic only for internal communication ( block 0.0.0.0/0 and allow vpc cidr)
kops update cluster **** --yes --lifecycle-overrides SecurityGroup=Ignore,SecurityGroupRule=Ignore
5. What happened after the commands executed?
exceed timeout
6. What did you expect to happen?
When ssh into the master node, the nodeup process exit's with the following error :
Apr 5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.305209 1035 s3context.go:192] unable to get bucket location from region "us-east-1"; scanning all regions: RequestError: send request failed
Apr 5 08:03:24 ip-172-20-10-182 nodeup[1035]: caused by: Get "https://s3.dualstack.us-east-1.amazonaws.com/r*****?location=": dial tcp 52.217.230.168:443: i/o timeout
Apr 5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.374846 1035 s3context.go:298] Querying S3 for bucket location for ****
Apr 5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.374904 1035 s3context.go:303] Doing GetBucketLocation in "eu-west-3"
Apr 5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.374911 1035 s3context.go:303] Doing GetBucketLocation in "us-west-2"
Apr 5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.374930 1035 s3context.go:303] Doing GetBucketLocation in "eu-west-1"
Apr 5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.375066 1035 s3context.go:303] Doing GetBucketLocation in "ca-central-1"
Apr 5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.378346 1035 s3context.go:303] Doing GetBucketLocation in "ap-northeast-3"
Apr 5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.378520 1035 s3context.go:303] Doing GetBucketLocation in "us-east-2"
Apr 5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.378718 1035 s3context.go:303] Doing GetBucketLocation in "eu-south-1"
Apr 5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.378767 1035 s3context.go:303] Doing GetBucketLocation in "us-west-1"
Apr 5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.378885 1035 s3context.go:303] Doing GetBucketLocation in "eu-central-1"
Apr 5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.378406 1035 s3context.go:303] Doing GetBucketLocation in "ap-south-1"
Apr 5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.378418 1035 s3context.go:303] Doing GetBucketLocation in "eu-north-1"
Apr 5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.378439 1035 s3context.go:303] Doing GetBucketLocation in "ap-northeast-2"
Apr 5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.378454 1035 s3context.go:303] Doing GetBucketLocation in "ap-northeast-1"
Apr 5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.378472 1035 s3context.go:303] Doing GetBucketLocation in "us-east-1"
Apr 5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.378481 1035 s3context.go:303] Doing GetBucketLocation in "sa-east-1"
Apr 5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.378490 1035 s3context.go:303] Doing GetBucketLocation in "ap-southeast-1"
Apr 5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.378498 1035 s3context.go:303] Doing GetBucketLocation in "ap-southeast-2"
Apr 5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.379255 1035 s3context.go:303] Doing GetBucketLocation in "eu-west-2"
Apr 5 08:03:29 ip-172-20-10-182 nodeup[1035]: W0405 08:03:29.375004 1035 main.go:133] got error running nodeup (will retry in 30s): error loading Cluster "s3://****/******/cluster-completed.spec": Could not retrieve location for AWS bucket *****
7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml
to display your cluster manifest.
You may want to remove your cluster name and other sensitive information.
apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
creationTimestamp: "2024-04-05T07:23:05Z"
name: ********
spec:
additionalPolicies: {}
api:
loadBalancer:
class: Classic
securityGroupOverride: sg-*****
type: Public
assets:
containerRegistry: *******.dkr.ecr.us-east-1.amazonaws.com/kops
fileRepository: https://s3.us-east-1.amazonaws.com/******
authorization:
rbac: {}
cloudProvider: aws
configBase: s3://*****/******
containerd:
configOverride: |2
version = 2
[plugins]
[plugins."io.containerd.grpc.v1.cri"]
sandbox_image = "*****.dkr.ecr.us-east-1.amazonaws.com/kops/pause:3.9@sha256:7031c1b283388d2c2e09b57badb803c05ebed362dc88d84b480cc47f72a21097"
[plugins."io.containerd.grpc.v1.cri".registry.mirrors."*******.dkr.ecr.us-east-1.amazonaws.com"]
endpoint = ["https://******.dkr.ecr.us-east-1.amazonaws.com"]
[plugins."io.containerd.grpc.v1.cri".registry.configs."******.dkr.ecr.us-east-1.amazonaws.com".auth]
username = "AWS"
password = "******"
[plugins."io.containerd.grpc.v1.cri".containerd]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
SystemdCgroup = true
dnsZone: *****
etcdClusters:
- etcdMembers:
- instanceGroup: master-1
name: master-1
name: main
iam:
allowContainerRegistry: true
legacy: false
kubeProxy:
enabled: true
kubelet:
anonymousAuth: false
kubernetesVersion: 1.26.4
masterPublicName: api.*****
networkCIDR: 172.20.0.0/16
networkID: vpc-*****
networking:
calico: {}
nodeTerminationHandler:
enableSpotInterruptionDraining: false
enabled: false
nonMasqueradeCIDR: 100.64.0.0/10
sshKeyName: *****
subnets:
- cidr: 172.20.10.0/24
id: subnet-*****
name: us-east-1b
type: Public
zone: us-east-1b
topology:
dns:
type: Public
masters: public
nodes: public
---
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
creationTimestamp: "2024-04-05T07:23:08Z"
labels:
kops.k8s.io/cluster: *****
name: master-1
spec:
image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20240126
kubelet:
anonymousAuth: false
nodeLabels:
kops.k8s.io/kops-controller-pki: ""
node-role.kubernetes.io/control-plane: ""
node.kubernetes.io/exclude-from-external-load-balancers: ""
taints:
- node-role.kubernetes.io/control-plane=:NoSchedule
machineType: m5.xlarge
manager: CloudGroup
maxSize: 1
minSize: 1
role: Master
securityGroupOverride: ******
subnets:
- us-east-1b
---
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
creationTimestamp: "2024-04-05T07:23:08Z"
labels:
kops.k8s.io/cluster: *****
name: node
spec:
image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20240126
kubelet:
anonymousAuth: false
nodeLabels:
node-role.kubernetes.io/node: ""
machineType: c6i.2xlarge
manager: CloudGroup
maxSize: 2
minSize: 2
nodeLabels:
nvidia.com/gpu.deploy.dcgm-exporter: "true"
nvidia.com/gpu.deploy.device-plugin: "true"
packages:
- nfs-common
role: Node
securityGroupOverride: sg-*****
subnets:
- us-east-1b
I have created a VPC endpoint for S3 with an Interface type, but all of the DNS records do not include the dualstack.
*.vpce-*****.s3.us-east-1.vpce.amazonaws.com
*.vpce-*****-us-east-1b.s3.us-east-1.vpce.amazonaws.com
s3.us-east-1.amazonaws.com
*.s3.us-east-1.amazonaws.com
*.s3-accesspoint.us-east-1.amazonaws.com
*.s3-control.us-east-1.amazonaws.com
Its not clear for me how this is kops bug?
There is no way to setup kops for disconnected env... i can open a feature request if you want to
there is way to install kops in disconnected environment. However, you must copy all assets first. It can be installed without any internet connectivity, you just need to have connectivity to single object storage.
https://kops.sigs.k8s.io/operations/asset-repository/
also you need to use kops channel: none (I cannot see this in your spec at all.. so its not none in that case. Default value is stable)
dualstack addresses are coming https://github.com/kubernetes/kops/blob/release-1.26/util/pkg/vfs/s3fs.go#L511-L515
@zetaab Although I have added all assets files and containers into s3 and ECR and configured kops to use it, when looking at the nodeup logs I can see an error when trying to retrieve the s3 cluster-completed.spec even if I configure a s3 vpc endpoint.
That's because kops using the s3://bucket-name schema and the s3 vpc endpoint use the full s3 DNS name (bucket-name.s3.us-east-1.amazonaws.com).
As a result, kops cannot be used in a disconnected environment on AWS
W0412 06:49:07.558115 1040 main.go:133] got error running nodeup (will retry in 30s): error loading Cluster "s3://kops-state-****/*****/cluster-completed.spec": file does not exist