Add kubernetes type clustering option
melkypie opened this issue · 12 comments
Currently the only KV storage we can use for clustering is Consul. A nice feature would be to add Kubernetes type and store all of the key/value information in Kubernetes objects similar to how argocd does it. This would allow the user to not have to maintain another KV storage solution.
I know this is quite a big ask but I have already managed to deploy gnmic clustering on Kubernetes with consul and having this would allow me to not worry about having another KV storage. If needed I could write a guide on how to do deploy it to kubernetes and help with the serviceaccount/rolebinding/role objects and other kubernetes related things.
I'm not sure if I understand correctly but this sounds like it needs a separate component acting as a k8s controller for gnmic. It would be responsible for managing the state of the cluster.
It could be that I'm overthinking this.
How does argocd store an instance state in k8s? Do the objects have a TTL ? Can the TTL be refreshed?
A guide to deploy gnmic on k8s will be very helpful, it would fit nicely with the docs.
Argocd stores most of its configuration in Secrets (but I am sure ConfigMaps would also be fine for gnmic) and Custom Resource Definitions (which would be too much for the simple use case in gmnic) which are basically key-value stores. They don't have a specific way to set a TTL but I am sure you could just create an entry in the specific configmap with the TTL value if that is needed.
Since from what i can currently see that is being stored in Consul is just the leader of the cluster and to which instance a target belongs to which is something that ConfigMaps in k8s can easily hold. The service availability checking feature of Consul is also in k8s.
For the guide, I will start working on it right away.
Thanks for working on the guide and thanks for the details about argocd.
Consul does a little bit more that just storage.
What I meant by TTL is a way for a key (leader or target ownership) to be deleted after a certain duration if its owner does not refresh it. Consul handles this natively. The key TTL mechanism makes leader election/reelection as well as target ownership locking/transfer easy.
Consul also allows to run a long request to get notifications about services change, basically removing the need for periodic polls to discover instances of a certain service.
About using k8s as KV store for clustering, I think ownerReference
can be used for leader election and target ownership:
- At startup, each gNMIc instance/pod tries to create a ConfigMap with a well-known predefined name, the first one to create it becomes the leader. The ones that failed to become a leader, periodically check if the ConfigMap still exists and try to create it if it doesn't, the one that succeeds takes over as the new leader.
- Then, same as clustering with Consul, the leader proceeds to dispatch targets to available gNMIc services
- When assigned a target, each instance creates a ConfigMap indicating that it claims ownership over that target and proceeds with creating the gNMI subscriptions.
- Each created ConfigMap will have its
ownerReference
field populated with a reference to the gNMIc instance that created it. If a ConfigMap doesn't have anownerReference
it is deleted by k8s GC. - The leader periodically goes over the list of ConfigMaps to make sure that each target has a corresponding ConfigMap with an existing owner. If a ConfigMap for a certain target is missing, the leader reassigns that target to an available gNMIc instance.
- A liveness probe might be needed to detect a failed gNMIc pod and delete it.
I believe this should work, open to comments and suggestions, I might have missed something or expected a piece to work differently from its real behavior.
I will give this a try and get back to you.
@melkypie if you can give the 0.25.0-beta release you will be able to try k8s based clustering.
It uses leases as a locking mechanism.
The deployment method is similar to what you already did with Consul except:
- Obviously no need to deploy a Consul cluster
- The clustering part in the configMap becomes:
clustering:
cluster-name: cluster1
targets-watch-timer: 30s
leader-wait-timer: 30s
locker:
type: k8s
namespace: gnmic # default to "default"
- RBAC:
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: gnmic
name: svc-pod-lease-reader
rules:
- apiGroups: [""]
resources: ["pods", "services"]
verbs: ["get", "watch", "list"]
- apiGroups: ["coordination.k8s.io"]
resources: ["leases"]
verbs: ["get", "list", "watch", "create", "update", "delete"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: gnmic-user
namespace: gnmic
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: read-pods-leases
namespace: gnmic
subjects:
- kind: ServiceAccount
name: gnmic-user
roleRef:
kind: Role
name: svc-pod-lease-reader
apiGroup: rbac.authorization.k8s.io
- Add the created service account to the SS spec:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: gnmic-ss
labels:
app: gnmic
spec:
replicas: 3
selector:
matchLabels:
app: gnmic
serviceName: gnmic-svc
template:
metadata:
labels:
app: gnmic
spec:
containers:
- args:
- subscribe
- --config
- /app/config.yaml
image: gnmic:0.0.0-k
imagePullPolicy: IfNotPresent
name: gnmic
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- all
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
ports:
- containerPort: 9804
name: prom-output
protocol: TCP
- containerPort: 7890
name: gnmic-api
protocol: TCP
resources:
limits:
cpu: 100m
memory: 400Mi
requests:
cpu: 50m
memory: 200Mi
envFrom:
- secretRef:
name: gnmic-login
env:
- name: GNMIC_API
value: :7890
- name: GNMIC_CLUSTERING_INSTANCE_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: GNMIC_CLUSTERING_SERVICE_ADDRESS
value: "$(GNMIC_CLUSTERING_INSTANCE_NAME).gnmic-svc.gnmic.svc.cluster.local"
- name: GNMIC_OUTPUTS_OUTPUT1_LISTEN
value: "$(GNMIC_CLUSTERING_INSTANCE_NAME).gnmic-svc.gnmic.svc.cluster.local:9804"
volumeMounts:
- mountPath: /app/config.yaml
name: config
subPath: config.yaml
serviceAccountName: gnmic-user # <-- service account name created earlier
volumes:
- configMap:
defaultMode: 420
name: gnmic-config
name: config
- Add a service for the gNMIc instance API, this service HAS to be called
${cluster-name}-gnmic-api
apiVersion: v1
kind: Service
metadata:
name: cluster1-gnmic-api
labels:
app: gnmic
spec:
ports:
- name: http
port: 7890
protocol: TCP
targetPort: 7890
selector:
app: gnmic
clusterIP: None
I did some tests on my side, it seems to be stable even when shrinking the SS size
karim@kss:~/github.com/karimra/gnmic$ kubectl get leases
NAME HOLDER AGE
gnmic-cluster1-leader gnmic-ss-0 2d4h
gnmic-cluster1-targets-172.20.20.15 gnmic-ss-0 2d4h
gnmic-cluster1-targets-172.20.20.16 gnmic-ss-1 2d4h
gnmic-cluster1-targets-172.20.20.17 gnmic-ss-0 2d4h
gnmic-cluster1-targets-172.20.20.18 gnmic-ss-2 2d4h
gnmic-cluster1-targets-172.20.20.19 gnmic-ss-0 2d4h
gnmic-cluster1-targets-172.20.20.20 gnmic-ss-2 2d4h
gnmic-cluster1-targets-172.20.20.21 gnmic-ss-2 2d4h
gnmic-cluster1-targets-172.20.20.22 gnmic-ss-2 2d4h
gnmic-cluster1-targets-172.20.20.23 gnmic-ss-1 2d4h
gnmic-cluster1-targets-172.20.20.24 gnmic-ss-1 2d4h
gnmic-cluster1-targets-172.20.20.25 gnmic-ss-0 2d4h
gnmic-cluster1-targets-172.20.20.26 gnmic-ss-1 2d4h
gnmic-cluster1-targets-172.20.20.27 gnmic-ss-0 2d4h
gnmic-cluster1-targets-172.20.20.28 gnmic-ss-2 2d4h
gnmic-cluster1-targets-172.20.20.29 gnmic-ss-1 2d4h
karim@kss:~/github.com/karimra/gnmic$
There is no mechanism to redistribute the targets when growing the SS
It would be helpful if you could give it a go to see if it fits your needs.
Will do, I won't be able to get back to you until Tuesday as I don't have access to cluster where I could test out GNMI due to easter holidays.
I gave it a try.
From my experience it only was able to assign a target to the leader of the cluster as other non-leader instances seem to be failing to acquire locks for targets assigned to them.
So it manages to assign 1 target ( the target that leader assigns itself after failing to assign it to other instances ) and then keeps on failing to assign other targets due to them not acquiring locks although if you look at leases you can see that the lease has been created.
I am testing this on an RKE2 cluster with 3 masters and 2 workers, kubernetes version: v1.22.5+rke2r1
melkypie:~/projects/kubernetes$ kubectl get leases -n gnmic
NAME HOLDER AGE
gnmic-ip-net-monit1-leader gnmic-ss-2 30m
gnmic-ip-net-monit1-targets-device1 gnmic-ss-2 29m
gnmic-ip-net-monit1-targets-device2 gnmic-ss-0 11s
StatefulSet.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: gnmic-ss
namespace: gnmic
labels:
app: gnmic
spec:
replicas: 3
selector:
matchLabels:
app: gnmic
serviceName: gnmic-svc
template:
metadata:
labels:
app: gnmic
version: 0.25.0-beta
spec:
containers:
- args:
- subscribe
- --config
- /app/config.yaml
image: ghcr.io/karimra/gnmic:0.25.0-beta-scratch
imagePullPolicy: IfNotPresent
name: gnmic
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- all
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
ports:
- containerPort: 9804
name: prom-output
protocol: TCP
- containerPort: 7890
name: gnmic-api
protocol: TCP
resources:
limits:
cpu: 100m
memory: 400Mi
requests:
cpu: 50m
memory: 200Mi
envFrom:
- secretRef:
name: gnmic-login
env:
- name: GNMIC_API
value: :7890
- name: GNMIC_CLUSTERING_INSTANCE_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: GNMIC_CLUSTERING_SERVICE_ADDRESS
value: "$(GNMIC_CLUSTERING_INSTANCE_NAME).gnmic-svc.gnmic.svc.cluster.local"
- name: GNMIC_OUTPUTS_PROM_LISTEN
value: "$(GNMIC_CLUSTERING_INSTANCE_NAME).gnmic-svc.gnmic.svc.cluster.local:9804"
volumeMounts:
- mountPath: /app/config.yaml
name: config
subPath: config.yaml
serviceAccountName: gnmic-user
volumes:
- configMap:
defaultMode: 420
name: gnmic-config
name: config
RBAC.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: gnmic
name: svc-pod-lease-reader
rules:
- apiGroups: [""]
resources: ["pods", "services"]
verbs: ["get", "watch", "list"]
- apiGroups: ["coordination.k8s.io"]
resources: ["leases"]
verbs: ["get", "list", "watch", "create", "update", "delete"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: gnmic-user
namespace: gnmic
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: read-pods-leases
namespace: gnmic
subjects:
- kind: ServiceAccount
name: gnmic-user
roleRef:
kind: Role
name: svc-pod-lease-reader
apiGroup: rbac.authorization.k8s.io
Service.yaml
apiVersion: v1
kind: Service
metadata:
name: gnmic-svc
namespace: gnmic
labels:
app: gnmic
spec:
ports:
- name: http
port: 9804
protocol: TCP
targetPort: 9804
selector:
app: gnmic
clusterIP: None
---
apiVersion: v1
kind: Service
metadata:
name: cluster1-gnmic-api
namespace: gnmic
spec:
ports:
- name: http
port: 7890
protocol: TCP
targetPort: 7890
selector:
app: gnmic
clusterIP: None
ConfigMap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: gnmic-config
namespace: gnmic
data:
config.yaml: |
insecure: true
encoding: json_ietf
log: true
clustering:
cluster-name: cluster1
targets-watch-timer: 30s
leader-wait-timer: 30s
locker:
type: k8s
namespace: gnmic
targets:
device1:
address: device1:6030
subscriptions:
- general
device2:
address: device2:6030
subscriptions:
- general
device3:
address: device3:6030
subscriptions:
- general
device4:
address: device4:6030
subscriptions:
- general
subscriptions:
general:
paths:
- /interfaces/interface/state/counters
stream-mode: sample
sample-interval: 5s
outputs:
prom:
type: prometheus
strings-as-labels: true
Also adding sanitized log files ( also I noticed that gnmic seems to be logging plaintext passwords in logs which would be great if it did not do that ):
gnmic-ss-1.log
gnmic-ss-0.log
gnmic-ss-2.log
The logs are from trying it a second time, so you can't see where it created the device1 lease.
I'm not sure what is going wrong here, I re tested with a single node as well as 1 control and 2 worker nodes (1.23.4 and 1.22.7)
I'm using kind clusters.
The leader timing out and reassigning the target to another node means that the selected instance was not able to create the lease and/or maintain it.
The leader assigning the target to itself I understood, but yea the most interesting part is that the lock/lease is not being recognized by the leader although if you look at the leases it is there.
My other thought was that maybe something was wrong with RBAC but when I get a pod with kubectl using that same serviceaccount (the one that gnmic uses) it can access all of the leases so not sure what is going on there.
I will give it another try tomorrow and try deleting the whole namespace before doing it.
Finally got around to testing it and I found the error!
I had a cluster name with a -
in it. So when it tries to list the leases, it replaces the cluster name -
with /
in here
gnmic/lockers/k8s_locker/k8s_registration.go
Line 106 in 3caa03e
It is my fault for not providing exact configs I used to deploy as then it might have been easier to debug.
EDIT: Also seems to be the case with targets having -
in them
That part actually replaces /
with -
.
But I think you put your finger on the problem; the leader won't be able to retrieve a lock if the cluster name or the target name contains a -
. Thanks for sharing your findings.
The leader keeps a mapping of the transformed key (/
--> -
) to the original key to be able to revert it back, but it can only map back the keys it locked itself (silly me), that's why only the leader locks are successful.
I was hoping to get away with this to maintain compatibility with the consul locker and not have to rewrite the global clustering code.
I got rid of the key mapping and added the original key as an annotation to the lease, that's how the List
function will be able to return the list of original keys given a prefix.
I did some tests with cluster name cluster-1
and it seems to be fine, a target lease looks like this:
Name: gnmic-cluster-1-targets-172.20.20.2
Namespace: gnmic
Labels: app=gnmic
gnmic-cluster-1-targets-172.20.20.2=gnmic-ss-2
Annotations: original-key: gnmic/cluster-1/targets/172.20.20.2
API Version: coordination.k8s.io/v1
Kind: Lease
Metadata:
Creation Timestamp: 2022-04-26T05:31:05Z
Managed Fields:
API Version: coordination.k8s.io/v1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.:
f:original-key:
f:labels:
.:
f:app:
f:gnmic-cluster-1-targets-172.20.20.2:
f:spec:
f:acquireTime:
f:holderIdentity:
f:leaseDurationSeconds:
f:renewTime:
Manager: gnmic
Operation: Update
Time: 2022-04-26T05:31:05Z
Resource Version: 1876693
UID: ea0e4259-b39a-47f2-a62a-60dfb64cccb1
Spec:
Acquire Time: 2022-04-26T05:39:53.085031Z
Holder Identity: gnmic-ss-2
Lease Duration Seconds: 10
Renew Time: 2022-04-26T05:39:53.085031Z
Events: <none>
I will issue a release shortly with this code so you can test it (if you don't mid)
Seems to be fine. Works with both cluster name and targets having -
in them.
The targets not being redistributed if the statefulset is scaled up does not currently work as you said is quite an important feature but that is out of scope for this issue.
Thanks for testing it, I will write some docs about k8s based clustering before releasing.
Concerning redistribution, I think this can be done periodically (enabled via a knob redistribution-interval: 5m
for e.g)
or triggered by an API request to the leader.
If you are interested in this, please open another issue we can follow it up there.