Supporting a whole site service shadow deployment
Closed this issue · 4 comments
This is a requirement is needed by performance testing on production.
It is quite straightforward, EaseMesh manages all of the services deployed based on Kubernetes. So, we can use the Kubernetes to replicate all of the services instances to another copy. We call this the "Shadow Service". After that, we can schedule the test traffic to the "shadow service" for the test.
In other words, we try to finish the following works.
- Make all services a copy as a shadow.
- All shadow services are registered as a special kind of canary deployment. and only specific traffic can be scheduled for them.
- At last, all shadows services can be removed safely.
Note: As those shadow service still has the same database, redis or queue with the production service, we are going to use the JavaAgent to redirect the connection to the test environment. This requirement is addressed by megaease/easeagent#99
The shadow service needs to support the creation of requests from other services, such as easeload, or users through command-line operations. Therefore, it needs to:
- Provide API interface for other service calls.
- Provide emctl subcommand for managers to call.
In addition to the deployment object created by meshdeployment, easemesh also supports native deployment, so we only need to copy the deployment. Copy all or a single qualified deployment of a namespace by specifying the namespace and deployname.
Deployment replication rules are as follows:
-
- The specified namespace must conform to the mutatehook configuration.
-
- Copy only the Deployment where the easemesh-sidecar container exists in the Deployment.Spec.
-
- Remove sidecarcontainer, initcontainer, volume and other elements injected by the operator.
-
- Modify the original name, selector and annotations, and add the shadow tag.
Examples are as follows:
- Original Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "1"
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"apps/v1","kind":"Deployment","metadata":{"annotations":{"mesh.megaease.com/alive-probe-url":"http://localhost:9900/health","mesh.megaease.com/service-name":"vets-service"},"name":"vets-service","namespace":"spring-petclinic"},"spec":{"replicas":1,"selector":{"matchLabels":{"app":"vets-service"}},"template":{"metadata":{"labels":{"app":"vets-service"}},"spec":{"containers":[{"args":["-c","java -server -Xmx1024m -Xms1024m -Dspring.profiles.active=sit -Djava.security.egd=file:/dev/./urandom org.springframework.boot.loader.JarLauncher"],"command":["/bin/sh"],"image":"megaease/spring-petclinic-vets-service:latest","imagePullPolicy":"IfNotPresent","lifecycle":{"preStop":{"exec":{"command":["sh","-c","sleep 10"]}}},"name":"vets-service","ports":[{"containerPort":8080}],"resources":{"limits":{"cpu":"2000m","memory":"1Gi"},"requests":{"cpu":"200m","memory":"256Mi"}},"volumeMounts":[{"mountPath":"/application/application-sit.yml","name":"configmap-volume-0","subPath":"application-sit.yml"}]}],"restartPolicy":"Always","volumes":[{"configMap":{"defaultMode":420,"items":[{"key":"application-sit-yml","path":"application-sit.yml"}],"name":"vets-service"},"name":"configmap-volume-0"}]}}}}
mesh.megaease.com/alive-probe-url: http://localhost:9900/health
mesh.megaease.com/service-name: vets-service
name: vets-service
namespace: spring-petclinic
spec:
replicas: 1
selector:
matchLabels:
app: vets-service
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
labels:
app: vets-service
spec:
containers:
- args:
- -c
- java -server -Xmx1024m -Xms1024m -Dspring.profiles.active=sit -Djava.security.egd=file:/dev/./urandom org.springframework.boot.loader.JarLauncher
command:
- /bin/sh
env:
- name: JAVA_TOOL_OPTIONS
value: ' -javaagent:/agent-volume/easeagent.jar -Deaseagent.log.conf=/agent-volume/log4j2.xml '
image: megaease/spring-petclinic-vets-service:latest
imagePullPolicy: IfNotPresent
lifecycle:
preStop:
exec:
command:
- sh
- -c
- sleep 10
name: vets-service
ports:
- containerPort: 8080
protocol: TCP
resources:
limits:
cpu: "2"
memory: 1Gi
requests:
cpu: 200m
memory: 256Mi
volumeMounts:
- mountPath: /application/application-sit.yml
name: configmap-volume-0
subPath: application-sit.yml
- mountPath: /agent-volume
name: agent-volume
- command:
- /bin/sh
- -c
- /opt/easegress/bin/easegress-server -f /sidecar-volume/sidecar-config.yaml
env:
- name: APPLICATION_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
image: 172.20.2.189:5001/megaease/easegress:server-sidecar
imagePullPolicy: IfNotPresent
name: easemesh-sidecar
ports:
- containerPort: 13001
name: sidecar-ingress
protocol: TCP
- containerPort: 13002
name: sidecar-egress
protocol: TCP
- containerPort: 13009
name: sidecar-eureka
protocol: TCP
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /sidecar-volume
name: sidecar-volume
initContainers:
- command:
- sh
- -c
- "set -e\ncp -r /easeagent-volume/* /agent-volume\n\necho 'name: vets-service\ncluster-join-urls:
http://easemesh-controlplane-svc.easemesh:2380\ncluster-request-timeout:
10s\ncluster-role: reader\ncluster-name: easemesh-control-plane\nlabels:\n
\ alive-probe: http://localhost:9900/health\n application-port: 8080\n
\ mesh-service-labels: \n mesh-servicename: vets-service\n' > /sidecar-volume/sidecar-config.yaml"
image: 172.20.2.189:5001/megaease/easeagent-initializer:latest
imagePullPolicy: IfNotPresent
name: initializer
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /agent-volume
name: agent-volume
- mountPath: /sidecar-volume
name: sidecar-volume
volumes:
- configMap:
defaultMode: 420
items:
- key: application-sit-yml
path: application-sit.yml
name: vets-service
name: configmap-volume-0
- emptyDir: {}
name: agent-volume
- emptyDir: {}
name: sidecar-volume
- Shadow Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "1"
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"apps/v1","kind":"Deployment","metadata":{"annotations":{"mesh.megaease.com/alive-probe-url":"http://localhost:9900/health","mesh.megaease.com/service-name":"vets-service"},"name":"vets-service","namespace":"spring-petclinic"},"spec":{"replicas":1,"selector":{"matchLabels":{"app":"vets-service"}},"template":{"metadata":{"labels":{"app":"vets-service"}},"spec":{"containers":[{"args":["-c","java -server -Xmx1024m -Xms1024m -Dspring.profiles.active=sit -Djava.security.egd=file:/dev/./urandom org.springframework.boot.loader.JarLauncher"],"command":["/bin/sh"],"image":"megaease/spring-petclinic-vets-service:latest","imagePullPolicy":"IfNotPresent","lifecycle":{"preStop":{"exec":{"command":["sh","-c","sleep 10"]}}},"name":"vets-service","ports":[{"containerPort":8080}],"resources":{"limits":{"cpu":"2000m","memory":"1Gi"},"requests":{"cpu":"200m","memory":"256Mi"}},"volumeMounts":[{"mountPath":"/application/application-sit.yml","name":"configmap-volume-0","subPath":"application-sit.yml"}]}],"restartPolicy":"Always","volumes":[{"configMap":{"defaultMode":420,"items":[{"key":"application-sit-yml","path":"application-sit.yml"}],"name":"vets-service"},"name":"configmap-volume-0"}]}}}}
mesh.megaease.com/alive-probe-url: http://localhost:9900/health
mesh.megaease.com/service-name: vets-service
// 1. Add shadow service-label
mesh.megaease.com/service-labels: "version=shadow"
// 2. change Deployment name
name: vets-service-shadow
namespace: spring-petclinic
spec:
replicas: 1
selector:
matchLabels:
app: vets-service
// 3. add new selector for shadow pods
mesh.megaease.com/mesh-shadow-service: true
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
labels:
app: vets-service
// 4. add new label for shadow pods
mesh.megaease.com/mesh-shadow-service: true
spec:
containers:
- args:
- -c
- java -server -Xmx1024m -Xms1024m -Dspring.profiles.active=sit -Djava.security.egd=file:/dev/./urandom org.springframework.boot.loader.JarLauncher
command:
- /bin/sh
env:
- name: JAVA_TOOL_OPTIONS
value: ' -javaagent:/agent-volume/easeagent.jar -Deaseagent.log.conf=/agent-volume/log4j2.xml '
image: megaease/spring-petclinic-vets-service:latest
imagePullPolicy: IfNotPresent
lifecycle:
preStop:
exec:
command:
- sh
- -c
- sleep 10
name: vets-service
ports:
- containerPort: 8080
protocol: TCP
resources:
limits:
cpu: "2"
memory: 1Gi
requests:
cpu: 200m
memory: 256Mi
volumeMounts:
- mountPath: /application/application-sit.yml
name: configmap-volume-0
subPath: application-sit.yml
volumes:
- configMap:
defaultMode: 420
items:
- key: application-sit-yml
path: application-sit.yml
name: vets-service
name: configmap-volume-0
Let me clarify the shadow service architecture through a diagram:
There are several parts in the architecture:
Shadow service: It is an in-cluster service, which accepts requests forward from the EaseMesh control plane (Easegress). When a request is incoming, the Shadow service will fetch the deployment information in the K8s cluster, After rendering a new shadow deployment with annotation, it will send create to the K8s API server to deploy new deployments.
MeshController: It's the endpoint of the EaseMesh control plane, which manages the specifications and service registry of the Mesh. The emctl
and other services could integrate EaseMesh via it.
Why do we need a standalone Shadow service.
First, The logic implemented by the Shadow service doesn't belong to the scope of the common Mesh. Second, it's heavily communicated with K8s and very different from EaseMesh controller implementation. Thirdly, like K8s, the EaseMesh control plane needs a mechanism to allow developers to customize the EaseMesh. After several days of study, I found the K8s has a mechanism to extend the K8s API server which is API Aggregation layer [1]. I think we could leverage its design extending the EaseMesh control plane and implement the requirement.
[1]https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/apiserver-aggregation
Some Q&A:
- Can emctl communicate with customized service (currently only shadow service) directly?
We use k8s apiserver as one and only one gate for all APIs, and MeshController will also support extensional APIs, so it's better to use unique access gate.
- Does a new customized service means adding a new API (new image) for MeshController (in Easegress)?
No, the extensible APIs of MeshController will cover it from start.
- Does EaseMesh send all information for the shadow service instead of itself fetching?
It depends, we need to design a general APIs(Mesh service dedicated info, general storage, etc) for customized serviced.