docs: CustomResourceDefinition is mandatory? + A tip for kops users
Opened this issue · 2 comments
Hello,
I wanted to check my understanding on a couple things before I offer a PR.
CustomResourceDefinition is mandatory?
I have a k8s 1.8.0 cluster. Following the README I deployed k8s-snapshots (both v2.0 and dev), annotated a Persistent Volume, and hit this error:
2017-10-05T04:02:34.136420193Z requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://100.64.0.1:443/apis/k8s-snapshots.elsdoerfer.com/v1/snapshotrules?watch=true
k8s-snapshots.elsdoerfer.com
was not in the output of curl https://100.64.0.1:443/apis/
.
I deployed the CustomResourceDefinition from the "Manual snapshot rules" section later in the README and the error went away.
Is it expected that the CRD is mandatory, at least in k8s 1.7+? If so, I'll update the docs. If not, I can provide more detail about my setup if this is a bug worth investigating.
A tip for kops users
We use kops to manage our k8s cluster. k8s-snapshots didn't work out of the box due to a permissions issue. If you agree, I'd like to add a tip about this to the README for fellow kops users:
k8s-snapshots need EBS and S3 permissions to take and save snapshots. Under the kops IAM Role scheme, only Masters have these permissions. The easiest solution is to run k8s-snapshots on Masters.
To run on a Master, we need to:
-
overcome a Taint -- see https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/
-
specify that we require a Master -- see https://kubernetes.io/docs/concepts/configuration/assign-pod-node/
To do this, add the following to the above manifest for the k8s-snapshots Deployment:
spec:
...
template:
...
spec:
...
tolerations:
- key: "node-role.kubernetes.io/master"
operator: "Equal"
value: ""
effect: "NoSchedule"
nodeSelector:
kubernetes.io/role: master
Thanks
k8s-snapshots is cool! :)
The note about kops is great, I would definitely merge that pr.
CustomResourceDefinition is not mandatory. You should be able to ignore the error in the logs just fine. Backup disks then need to be annotated as described in the docs.
PR for kops tip in README: #35
On the CRD thing: You are correct that scheduled snapshots work in spite of the error (I must have failed to wait long enough during my testing :)).
However, the stack trace is kind of ominous. Here it is after starting k8s-snapshots:dev
on my cluster without the CRD.
2017-10-08T15:04:13.047876109Z 2017-10-08T15:04:13.046730Z rule.heartbeat [k8s_snapshots.core] message=rule.heartbeat rules=None severity=INFO
2017-10-08T15:04:13.049515787Z 2017-10-08T15:04:13.048665Z kube-config.from-service-account [k8s_snapshots.context] message=kube-config.from-service-account severity=INFO
2017-10-08T15:04:13.07426379Z 2017-10-08T15:04:13.072241Z volume-event.received [k8s_snapshots.core] event_object={'kind': 'PersistentVolume', 'apiVersion': 'v1', 'metadata': {'name': 'couchdb-pv', 'selfLink': '/api/v1/persistentvolumes/couchdb-pv', 'uid': 'd9c3b5a7-ac39-11e7-85b1-061d4acbdfa0', 'resourceVersion': '6386', 'creationTimestamp': '2017-10-08T15:03:32Z', 'labels': {'failure-domain.beta.kubernetes.io/region': 'us-east-2', 'failure-domain.beta.kubernetes.io/zone': 'us-east-2a'}, 'annotations': {'kubectl.kubernetes.io/last-applied-configuration': '{"apiVersion":"v1","kind":"PersistentVolume","metadata":{"annotations":{},"name":"couchdb-pv","namespace":""},"spec":{"accessModes":["ReadWriteOnce"],"awsElasticBlockStore":{"volumeID":"vol-026057a3270dc432b"},"capacity":{"storage":"5Gi"},"storageClassName":"gpii-default"}}\n', 'pv.kubernetes.io/bound-by-controller': 'yes'}}, 'spec': {'capacity': {'storage': '5Gi'}, 'awsElasticBlockStore': {'volumeID': 'vol-026057a3270dc432b'}, 'accessModes': ['ReadWriteOnce'], 'claimRef': {'kind': 'PersistentVolumeClaim', 'namespace': 'default', 'name': 'couchdb-pvc', 'uid': 'da4cdb5d-ac39-11e7-9d66-0adf15f46cfa', 'apiVersion': 'v1', 'resourceVersion': '6384'}, 'persistentVolumeReclaimPolicy': 'Retain', 'storageClassName': 'gpii-default'}, 'status': {'phase': 'Bound'}} event_type=ADDED message=volume-event.received: event_type='ADDED', event_object.metadata.name='couchdb-pv' severity=INFO
2017-10-08T15:04:13.23814782Z 2017-10-08T15:04:13.236536Z rule.added [k8s_snapshots.core] event_object={'kind': 'PersistentVolume', 'apiVersion': 'v1', 'metadata': {'name': 'couchdb-pv', 'selfLink': '/api/v1/persistentvolumes/couchdb-pv', 'uid': 'd9c3b5a7-ac39-11e7-85b1-061d4acbdfa0', 'resourceVersion': '6386', 'creationTimestamp': '2017-10-08T15:03:32Z', 'labels': {'failure-domain.beta.kubernetes.io/region': 'us-east-2', 'failure-domain.beta.kubernetes.io/zone': 'us-east-2a'}, 'annotations': {'kubectl.kubernetes.io/last-applied-configuration': '{"apiVersion":"v1","kind":"PersistentVolume","metadata":{"annotations":{},"name":"couchdb-pv","namespace":""},"spec":{"accessModes":["ReadWriteOnce"],"awsElasticBlockStore":{"volumeID":"vol-026057a3270dc432b"},"capacity":{"storage":"5Gi"},"storageClassName":"gpii-default"}}\n', 'pv.kubernetes.io/bound-by-controller': 'yes'}}, 'spec': {'capacity': {'storage': '5Gi'}, 'awsElasticBlockStore': {'volumeID': 'vol-026057a3270dc432b'}, 'accessModes': ['ReadWriteOnce'], 'claimRef': {'kind': 'PersistentVolumeClaim', 'namespace': 'default', 'name': 'couchdb-pvc', 'uid': 'da4cdb5d-ac39-11e7-9d66-0adf15f46cfa', 'apiVersion': 'v1', 'resourceVersion': '6384'}, 'persistentVolumeReclaimPolicy': 'Retain', 'storageClassName': 'gpii-default'}, 'status': {'phase': 'Bound'}} event_type=ADDED message=rule.added: rule.name='pvc-couchdb-pvc' rule=Rule(name='pvc-couchdb-pvc', deltas=[datetime.timedelta(0, 300), datetime.timedelta(0, 900), datetime.timedelta(0, 2700)], backend='aws', disk=AWSDiskIdentifier(region='us-east-2', volume_id='vol-026057a3270dc432b'), source='/api/v1/namespaces/default/persistentvolumeclaims/couchdb-pvc') severity=INFO
2017-10-08T15:04:15.062602774Z 2017-10-08T15:04:15.061155Z volume-event.received [k8s_snapshots.core] event_object={'kind': 'PersistentVolumeClaim', 'apiVersion': 'v1', 'metadata': {'name': 'couchdb-pvc', 'namespace': 'default', 'selfLink': '/api/v1/namespaces/default/persistentvolumeclaims/couchdb-pvc', 'uid': 'da4cdb5d-ac39-11e7-9d66-0adf15f46cfa', 'resourceVersion': '6388', 'creationTimestamp': '2017-10-08T15:03:33Z', 'annotations': {'backup.kubernetes.io/deltas': 'PT5M PT15M PT45M', 'kubectl.kubernetes.io/last-applied-configuration': '{"apiVersion":"v1","kind":"PersistentVolumeClaim","metadata":{"annotations":{"backup.kubernetes.io/deltas":"PT5M PT15M PT45M"},"name":"couchdb-pvc","namespace":"default"},"spec":{"accessModes":["ReadWriteOnce"],"resources":{"requests":{"storage":"5Gi"}},"storageClassName":"gpii-default"}}\n', 'pv.kubernetes.io/bind-completed': 'yes', 'pv.kubernetes.io/bound-by-controller': 'yes'}}, 'spec': {'accessModes': ['ReadWriteOnce'], 'resources': {'requests': {'storage': '5Gi'}}, 'volumeName': 'couchdb-pv', 'storageClassName': 'gpii-default'}, 'status': {'phase': 'Bound', 'accessModes': ['ReadWriteOnce'], 'capacity': {'storage': '5Gi'}}} event_type=ADDED message=volume-event.received: event_type='ADDED', event_object.metadata.name='couchdb-pvc' severity=INFO
2017-10-08T15:04:15.092107689Z 2017-10-08T15:04:15.090550Z rule.added [k8s_snapshots.core] event_object={'kind': 'PersistentVolumeClaim', 'apiVersion': 'v1', 'metadata': {'name': 'couchdb-pvc', 'namespace': 'default', 'selfLink': '/api/v1/namespaces/default/persistentvolumeclaims/couchdb-pvc', 'uid': 'da4cdb5d-ac39-11e7-9d66-0adf15f46cfa', 'resourceVersion': '6388', 'creationTimestamp': '2017-10-08T15:03:33Z', 'annotations': {'backup.kubernetes.io/deltas': 'PT5M PT15M PT45M', 'kubectl.kubernetes.io/last-applied-configuration': '{"apiVersion":"v1","kind":"PersistentVolumeClaim","metadata":{"annotations":{"backup.kubernetes.io/deltas":"PT5M PT15M PT45M"},"name":"couchdb-pvc","namespace":"default"},"spec":{"accessModes":["ReadWriteOnce"],"resources":{"requests":{"storage":"5Gi"}},"storageClassName":"gpii-default"}}\n', 'pv.kubernetes.io/bind-completed': 'yes', 'pv.kubernetes.io/bound-by-controller': 'yes'}}, 'spec': {'accessModes': ['ReadWriteOnce'], 'resources': {'requests': {'storage': '5Gi'}}, 'volumeName': 'couchdb-pv', 'storageClassName': 'gpii-default'}, 'status': {'phase': 'Bound', 'accessModes': ['ReadWriteOnce'], 'capacity': {'storage': '5Gi'}}} event_type=ADDED message=rule.added: rule.name='pvc-couchdb-pvc' rule=Rule(name='pvc-couchdb-pvc', deltas=[datetime.timedelta(0, 300), datetime.timedelta(0, 900), datetime.timedelta(0, 2700)], backend='aws', disk=AWSDiskIdentifier(region='us-east-2', volume_id='vol-026057a3270dc432b'), source='/api/v1/namespaces/default/persistentvolumeclaims/couchdb-pvc') severity=INFO
2017-10-08T15:04:16.066642947Z 2017-10-08T15:04:16.064487Z watch-resources.worker.error [k8s_snapshots.kube] message=watch-resources.worker.error resource_type_name=SnapshotRule severity=ERROR
2017-10-08T15:04:16.066674797Z Traceback (most recent call last):
2017-10-08T15:04:16.066679198Z File "/usr/local/lib/python3.6/site-packages/k8s_snapshots-0.0.0-py3.6.egg/k8s_snapshots/kube.py", line 181, in worker
2017-10-08T15:04:16.066682547Z for event in sync_iterator:
2017-10-08T15:04:16.066685883Z File "/usr/local/lib/python3.6/site-packages/pykube/query.py", line 156, in object_stream
2017-10-08T15:04:16.066688995Z self.api.raise_for_status(r)
2017-10-08T15:04:16.066691617Z File "/usr/local/lib/python3.6/site-packages/pykube/http.py", line 99, in raise_for_status
2017-10-08T15:04:16.06669461Z resp.raise_for_status()
2017-10-08T15:04:16.066698262Z File "/usr/local/lib/python3.6/site-packages/requests/models.py", line 935, in raise_for_status
2017-10-08T15:04:16.066701479Z raise HTTPError(http_error_msg, response=self)
2017-10-08T15:04:16.066713656Z requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://100.64.0.1:443/apis/k8s-snapshots.elsdoerfer.com/v1/snapshotrules?watch=true
This looks a little scary, so I suggest suppressing the error (e.g. "No CustomResourceDefinition found, but that is ok" instead of a dozen lines of traceback). I don't have time to PR it myself, though I can make a separate issue for this problem if you like.
Regardless, Google can now find this error message in this issue so hopefully future users will be less frightened of this stack trace than I was :).