Service-Mesh

In micro-services era, a typical application might encompass hundreds of services; each service might run in multiple instances; and each of those instances might be in a constantly-change state when they are deployed / scheduled by using an orchestrator. Currently, in an organization, a routine set successfully in making systematize pattern for packaging and deploying the services with the powerful abstractions provided by things like Docker and Kubernetes which drastically reduced the incremental operational burden to deployment. With these tools, it results in a dramatic reduction in the cost of adopting micro-services.

The real challenge with the micro-services type implementation is not building the services themselves, but the communication between services and different confronts arising from this inter-service communications are

Load balancing
Distributed tracing
TLS encryption and mutual TLS authentication
Resiliency
- Retry
- Circuit Breaker
Service Versioning even more…

Is it possible to handle all these challenges in a decentralized approach without touching the application services?

Does it practicable by offloading all of the inter-service communication requirements to a different layer by keeping service code independent as these specifications are quite generic in any micro-services implementations?

Can we undertake and provide language agnostic solution as an application service may code in different languages and communication among those might on the top of standard protocols such as HTTP1.x/2.x,gRPC, WebSocket, TCP traffic etc..?

Does it feasible to face these challenges on already deployed / on the top of micro-services network without touching it?

To address all these concerns, a “Service Mesh” is designed

A service mesh is a dedicated infrastructure layer for handling service-to-service communication and global cross-cutting of concerns to make these communications more reliable, secure, observable and manageable
This design helped in standardizes the runtime operations of our applications in the same way that Docker and Kubernetes standardized deploy-time operations
A given micro-service won’t directly communicate with the other micro-service. Rather all service-to-service communications will take place on-top of a software component called side-car proxy.

Common features offered from Service Mesh

The different functionalities offered through these features are

It enhance reliability by supporting circuit-breaking, retries and timeouts, fault injection, fault handling, load balancing and failover
It not only provides discovery of service through a dedicated service registry but also tracking the messages that convey information about server locations. For instance, it guarantees message passing and proper return of errors when message don’t reach their destinations
Primitive routing capabilities
It improves deep visibility by providing long-term metrics on service availability. Also it provides capabilities like monitoring, distributed logging, distributed tracing
Transport Level Security and key management
Simple blacklist and whitelist based access control
Native support for containers, Docker and Kubernetes
It also often has more complex operational requirements, like A/B testing, canary releases

In service mesh, for a given service which communicates with other service comprises of

Business Logic
- Includes logic related to its business functions, computations
Primitive Network Functions
- A basic high-level network interactions to connect with the service mesh / side-car proxy
Application Network Functions
- Tightly coupled to network such as circuit breaking, timeouts, retries, client side LB etc
Control Plane
- Quite useful to support capabilities like access control, observability, service discovery etc

Service mesh proxy deployment models

It can be deployed in two different patterns

Per-host proxy deployment
- One proxy is deployed per host
- A host can be virtual machine, or a physical host or a Kubernetes worker node
- Multiple instances of application services run on the host
- On a given host route traffic through this one proxy instance.
- In case of kubernetes, the proxy instance can be deployed as daemonset
Sidecar proxy deployment
- One proxy is deployed per instance of every service
- In kubernetes, a service mesh sidecar container can be deployed along with application service container as a part of the kubernetes pod
- This approach requires more instances of sidecar, hence a smaller resource profile for sidecar is usually appropriate

Request Flow – Service mesh, ingress, egress

By default, proxies handle only intra-service mesh cluster traffic – between the source (upstream) and the destination (downstream) services.
To expose a service which is part of service mesh to outside world, it must enable ingress traffic
Similarly, if a service depends on an external service , it requires enabling the egress traffic

Service Mesh implementations

Linkerd and Istio are two popular open source service mesh implementations. As both follow a similar architecture, but with different implementation mechanism, will explore more with Istio and will give insight about the features provided by these two implementations

Istio

Without any changes in the service code, it eases creation of network of deployed services with

Fine-grained control of traffic behavior with rich routing rules, retries, failovers, and fault injection
Automatic load balancing for HTTP, gRPC, WebSocket, and TCP traffic
A pluggable policy layer and configuration API supporting access controls, rate limits and quotas
Automatic metrics, logs and traces for all traffic with in a cluster, including ingress and egress
Secure service-to-service communication in a cluster with strong identity-based authentication and authorization

Architecture

Logically split into a

data plane
- Touches every packet/request in the system
- Responsible for
  - service discovery
  - health checking
  - routing
  - load balancing
  - authentication / authorization
  - observability
control plane
- provides policy and configuration for all the running data planes in the mesh
- does not touch any packets / requests in the system
- it turns all of the data planes into a distributed system

Different features provided by Istio and Linkerd explained(in detail) in the below table

Features	Istio	Linkerd
Proxy	Envoy	Finagle + Jetty
Circuit Breaker	Enforces at network level	Two types :-Fail Fast (session-driven) / Failure Accrual (request-driven)
Dynamic Request Routing	To service instances by versions or environment	Service destination (service name) and the concrete destination (version and environment)
Traffic Shifting/Splitting	Yes	Yes
Service Discovery	Platform-agnostic service discovery	Namer – file based service discovery;can also work with Zookeeper, Consul, Kubernetes
Load Balancing	Envoy’s load balancing algorithms. Eject unhealthy service instance from the load balancing pool	Finagle’s load balancing algorithms.Provides failure-and latency-aware load balancing
Security	Securing - End-user to service / Service-to-Service	Easy to add TLS to all service-to-service calls; Per-service/Per-environment certificates; Key management process
Access Control	Enforce simple whitelist – or blacklist-based access control; Enforce quotas and rate limits	-
Observability	Metrics(Prometheus, statsd); Monitoring(New Relic, Stackdrive); Logging(Application and access logs); Distributed tracing(zipkin)	Distributed tracing (Zipkin); Metrics(InfluxDB, Prometheus, statsd)
Deployment support	Kubernetes; Sidecar proxy	Kubernetes, Mesos, Cluster of hosts; per-host or sidecar proxy
Control Plane	Pilot, Mixer, Istio-Auth	Namerd
Service-to-Service communication	HTTP1.1 or HTTP/2, gRPC or TCP with or without TLS	HTTP1.1 or HTTP/2, gRPC or TCP with or without TLS

kranthiB/Service-Mesh