megaease/easemesh

Supporting a whole site service shadow deployment

Closed this issue · 4 comments

haoel commented

This is a requirement is needed by performance testing on production.

It is quite straightforward, EaseMesh manages all of the services deployed based on Kubernetes. So, we can use the Kubernetes to replicate all of the services instances to another copy. We call this the "Shadow Service". After that, we can schedule the test traffic to the "shadow service" for the test.

In other words, we try to finish the following works.

  • Make all services a copy as a shadow.
  • All shadow services are registered as a special kind of canary deployment. and only specific traffic can be scheduled for them.
  • At last, all shadows services can be removed safely.

Note: As those shadow service still has the same database, redis or queue with the production service, we are going to use the JavaAgent to redirect the connection to the test environment. This requirement is addressed by megaease/easeagent#99

The shadow service needs to support the creation of requests from other services, such as easeload, or users through command-line operations. Therefore, it needs to:

  1. Provide API interface for other service calls.
  2. Provide emctl subcommand for managers to call.

In addition to the deployment object created by meshdeployment, easemesh also supports native deployment, so we only need to copy the deployment. Copy all or a single qualified deployment of a namespace by specifying the namespace and deployname.

Deployment replication rules are as follows:

    1. The specified namespace must conform to the mutatehook configuration.
    1. Copy only the Deployment where the easemesh-sidecar container exists in the Deployment.Spec.
    1. Remove sidecarcontainer, initcontainer, volume and other elements injected by the operator.
    1. Modify the original name, selector and annotations, and add the shadow tag.

Examples are as follows:

  • Original Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "1"
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"apps/v1","kind":"Deployment","metadata":{"annotations":{"mesh.megaease.com/alive-probe-url":"http://localhost:9900/health","mesh.megaease.com/service-name":"vets-service"},"name":"vets-service","namespace":"spring-petclinic"},"spec":{"replicas":1,"selector":{"matchLabels":{"app":"vets-service"}},"template":{"metadata":{"labels":{"app":"vets-service"}},"spec":{"containers":[{"args":["-c","java -server -Xmx1024m -Xms1024m -Dspring.profiles.active=sit -Djava.security.egd=file:/dev/./urandom  org.springframework.boot.loader.JarLauncher"],"command":["/bin/sh"],"image":"megaease/spring-petclinic-vets-service:latest","imagePullPolicy":"IfNotPresent","lifecycle":{"preStop":{"exec":{"command":["sh","-c","sleep 10"]}}},"name":"vets-service","ports":[{"containerPort":8080}],"resources":{"limits":{"cpu":"2000m","memory":"1Gi"},"requests":{"cpu":"200m","memory":"256Mi"}},"volumeMounts":[{"mountPath":"/application/application-sit.yml","name":"configmap-volume-0","subPath":"application-sit.yml"}]}],"restartPolicy":"Always","volumes":[{"configMap":{"defaultMode":420,"items":[{"key":"application-sit-yml","path":"application-sit.yml"}],"name":"vets-service"},"name":"configmap-volume-0"}]}}}}
    mesh.megaease.com/alive-probe-url: http://localhost:9900/health
    mesh.megaease.com/service-name: vets-service
  name: vets-service
  namespace: spring-petclinic
spec:
  replicas: 1
  selector:
    matchLabels:
      app: vets-service
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: vets-service
    spec:
      containers:
      - args:
        - -c
        - java -server -Xmx1024m -Xms1024m -Dspring.profiles.active=sit -Djava.security.egd=file:/dev/./urandom  org.springframework.boot.loader.JarLauncher
        command:
        - /bin/sh
        env:
        - name: JAVA_TOOL_OPTIONS
          value: ' -javaagent:/agent-volume/easeagent.jar -Deaseagent.log.conf=/agent-volume/log4j2.xml '
        image: megaease/spring-petclinic-vets-service:latest
        imagePullPolicy: IfNotPresent
        lifecycle:
          preStop:
            exec:
              command:
              - sh
              - -c
              - sleep 10
        name: vets-service
        ports:
        - containerPort: 8080
          protocol: TCP
        resources:
          limits:
            cpu: "2"
            memory: 1Gi
          requests:
            cpu: 200m
            memory: 256Mi
        volumeMounts:
        - mountPath: /application/application-sit.yml
          name: configmap-volume-0
          subPath: application-sit.yml
        - mountPath: /agent-volume
          name: agent-volume
      - command:
        - /bin/sh
        - -c
        - /opt/easegress/bin/easegress-server -f /sidecar-volume/sidecar-config.yaml
        env:
        - name: APPLICATION_IP
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: status.podIP
        image: 172.20.2.189:5001/megaease/easegress:server-sidecar
        imagePullPolicy: IfNotPresent
        name: easemesh-sidecar
        ports:
        - containerPort: 13001
          name: sidecar-ingress
          protocol: TCP
        - containerPort: 13002
          name: sidecar-egress
          protocol: TCP
        - containerPort: 13009
          name: sidecar-eureka
          protocol: TCP
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /sidecar-volume
          name: sidecar-volume
      initContainers:
      - command:
        - sh
        - -c
        - "set -e\ncp -r /easeagent-volume/* /agent-volume\n\necho 'name: vets-service\ncluster-join-urls:
          http://easemesh-controlplane-svc.easemesh:2380\ncluster-request-timeout:
          10s\ncluster-role: reader\ncluster-name: easemesh-control-plane\nlabels:\n
          \ alive-probe: http://localhost:9900/health\n  application-port: 8080\n
          \ mesh-service-labels: \n  mesh-servicename: vets-service\n' > /sidecar-volume/sidecar-config.yaml"
        image: 172.20.2.189:5001/megaease/easeagent-initializer:latest
        imagePullPolicy: IfNotPresent
        name: initializer
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /agent-volume
          name: agent-volume
        - mountPath: /sidecar-volume
          name: sidecar-volume
      volumes:
      - configMap:
          defaultMode: 420
          items:
          - key: application-sit-yml
            path: application-sit.yml
          name: vets-service
        name: configmap-volume-0
      - emptyDir: {}
        name: agent-volume
      - emptyDir: {}
        name: sidecar-volume
  • Shadow Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "1"
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"apps/v1","kind":"Deployment","metadata":{"annotations":{"mesh.megaease.com/alive-probe-url":"http://localhost:9900/health","mesh.megaease.com/service-name":"vets-service"},"name":"vets-service","namespace":"spring-petclinic"},"spec":{"replicas":1,"selector":{"matchLabels":{"app":"vets-service"}},"template":{"metadata":{"labels":{"app":"vets-service"}},"spec":{"containers":[{"args":["-c","java -server -Xmx1024m -Xms1024m -Dspring.profiles.active=sit -Djava.security.egd=file:/dev/./urandom  org.springframework.boot.loader.JarLauncher"],"command":["/bin/sh"],"image":"megaease/spring-petclinic-vets-service:latest","imagePullPolicy":"IfNotPresent","lifecycle":{"preStop":{"exec":{"command":["sh","-c","sleep 10"]}}},"name":"vets-service","ports":[{"containerPort":8080}],"resources":{"limits":{"cpu":"2000m","memory":"1Gi"},"requests":{"cpu":"200m","memory":"256Mi"}},"volumeMounts":[{"mountPath":"/application/application-sit.yml","name":"configmap-volume-0","subPath":"application-sit.yml"}]}],"restartPolicy":"Always","volumes":[{"configMap":{"defaultMode":420,"items":[{"key":"application-sit-yml","path":"application-sit.yml"}],"name":"vets-service"},"name":"configmap-volume-0"}]}}}}
    mesh.megaease.com/alive-probe-url: http://localhost:9900/health
    mesh.megaease.com/service-name: vets-service
    // 1. Add shadow service-label
    mesh.megaease.com/service-labels: "version=shadow"
  // 2. change Deployment name
  name: vets-service-shadow
  namespace: spring-petclinic
spec:
  replicas: 1
  selector:
    matchLabels:
      app: vets-service
      // 3. add new selector for shadow pods
      mesh.megaease.com/mesh-shadow-service: true
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: vets-service
        // 4. add new label for shadow pods
        mesh.megaease.com/mesh-shadow-service: true
    spec:
      containers:
      - args:
        - -c
        - java -server -Xmx1024m -Xms1024m -Dspring.profiles.active=sit -Djava.security.egd=file:/dev/./urandom  org.springframework.boot.loader.JarLauncher
        command:
        - /bin/sh
        env:
        - name: JAVA_TOOL_OPTIONS
          value: ' -javaagent:/agent-volume/easeagent.jar -Deaseagent.log.conf=/agent-volume/log4j2.xml '
        image: megaease/spring-petclinic-vets-service:latest
        imagePullPolicy: IfNotPresent
        lifecycle:
          preStop:
            exec:
              command:
              - sh
              - -c
              - sleep 10
        name: vets-service
        ports:
        - containerPort: 8080
          protocol: TCP
        resources:
          limits:
            cpu: "2"
            memory: 1Gi
          requests:
            cpu: 200m
            memory: 256Mi
        volumeMounts:
        - mountPath: /application/application-sit.yml
          name: configmap-volume-0
          subPath: application-sit.yml
      volumes:
      - configMap:
          defaultMode: 420
          items:
          - key: application-sit-yml
            path: application-sit.yml
          name: vets-service
        name: configmap-volume-0

@haoel @zhao-kun @xxx7xxxx

Let me clarify the shadow service architecture through a diagram:
image

There are several parts in the architecture:

Shadow service: It is an in-cluster service, which accepts requests forward from the EaseMesh control plane (Easegress). When a request is incoming, the Shadow service will fetch the deployment information in the K8s cluster, After rendering a new shadow deployment with annotation, it will send create to the K8s API server to deploy new deployments.
MeshController: It's the endpoint of the EaseMesh control plane, which manages the specifications and service registry of the Mesh. The emctl and other services could integrate EaseMesh via it.

Why do we need a standalone Shadow service.

First, The logic implemented by the Shadow service doesn't belong to the scope of the common Mesh. Second, it's heavily communicated with K8s and very different from EaseMesh controller implementation. Thirdly, like K8s, the EaseMesh control plane needs a mechanism to allow developers to customize the EaseMesh. After several days of study, I found the K8s has a mechanism to extend the K8s API server which is API Aggregation layer [1]. I think we could leverage its design extending the EaseMesh control plane and implement the requirement.

[1]https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/apiserver-aggregation

Some Q&A:

  1. Can emctl communicate with customized service (currently only shadow service) directly?

We use k8s apiserver as one and only one gate for all APIs, and MeshController will also support extensional APIs, so it's better to use unique access gate.

  1. Does a new customized service means adding a new API (new image) for MeshController (in Easegress)?

No, the extensible APIs of MeshController will cover it from start.

  1. Does EaseMesh send all information for the shadow service instead of itself fetching?

It depends, we need to design a general APIs(Mesh service dedicated info, general storage, etc) for customized serviced.

Related to #107, It will be closed after #109 is accepted