Developers are expected to create over 750 million applications within the next three years. The catalyst that enables such an exponential scale of application development is the pervasive use of microservice-based application architectures. However, the interconnection of these microservices presents many challenges to operators, especially in the areas of security, observability, and management.
For instance, encryption is no longer simply an industry best practice. It can also be an executive order or a regulatory requirement. However, if developers must include encryption functionality within their loosely coupled microservices (which are often written in many different languages and may span across multiple different organizations) then this can not only add heavy burdens to their development cycles and push back release dates by weeks or even months. It can also lead to considerable disparity and inconsistency in implementation, resulting in incompatibilities and even vulnerabilities. A service mesh can lift this burden entirely off developers and enforce consistency in securing these vital communications.
Or perhaps the business requirement is to gain visibility into how application microservices are intercommunicating. Once these flows are encrypted, visibility is significantly reduced, if not lost altogether, in a blur of traffic all sharing the same characteristics up to Layer 7. Again, a service mesh can come to the rescue by providing operators visibility pre- and post-encryption, so they can observe these intra-application flows, while maintaining end-to-end confidentiality.
Or maybe the business requirement is to maximize application availability and experience. In a world where over half of mobile users abandon websites that take more than three seconds to load, the application needs to have both rock-solid availability and an excellent user experience to maintain customer loyalty and retention. Once again, a service mesh can deliver the required business result—in this case, by using traffic management policies to maximize application availability and performance.
What is a service mesh?
A service mesh is a layer of abstraction that provides extended and consistent functionality to microservices intercommunications. A simple way to think about it is as a dedicated network, running its own control plane and providing additional and consistent capabilities to individual microservices.
For example, in the “olden” days, applications were tightly coupled to physical and or virtual devices. If an application didn’t have built-in encryption, the network could provide it. Similarly, if the application did not have any access policies built into it, its connected network could provide this. The same applied to other services, such as authentication, load balancing, logging, etc., as shown in Figure 1.
Figure 1: Extended Services Being Applied to Applications via the Network
It's worth noting that the network has a separate control plane from the application. As such, the services it extends to the application (such as access policy, encryption policy, etc.) are not tied to business logic and can be independently administered. Neither are the network services dependent upon the language the application is written in, making these services universally applicable.
So, too, with a service mesh. Ordinarily, each containerized microservice is wrapped within a single pod (a thin layer of abstraction used for container orchestration by Kubernetes), as shown in the left within Figure 2. However, a service mesh injects an additional sidecar or proxy into the pod to provide extended, yet consistent, capabilities for the containerized microservice. This sidecar proxy is managed with an independent control plane, as shown in the right within Figure 2, where an Envoy sidecar— managed by the Istio control plane—is used to add encryption, access control, and observability capabilities to the containerized application microservice.
Figure 2: Enabling a Service Mesh via a Sidecar Proxy
A service mesh can add significant functionality to application microservices, including, but not limited to:
- Service discovery
- Authentication and authorization
- Access control
- Encryption
- Routing policy
- Traffic-management policy
- Observability, including metrics, event logs, and traces
The best part is that these additional capabilities are added: 1) without any burden being placed on application developers, 2) in a uniform and consistent manner, and 3) in a way that allows for these to be independently and centrally managed.
Which service mesh should I use?
Many service-mesh offerings exist, including Istio, Linkerd, Consul, Kuma, Traefik Mesh, AWS App Mesh, and many others.
The bottom line is that each of these service mesh technologies has its own unique strengths and weaknesses. For this reason, each organization should choose the service-mesh technology that has strengths that most tightly align with their business priorities, while considering their respective weaknesses.
Cisco has chosen to build its service-mesh solution on Istio. The business requirement was to apply our 30 years of networking and security leadership to this new realm of networks, which meant we needed the service mesh with the most powerful and extensible feature set. And Istio is widely recognized as the most feature-rich and extensible service mesh.
For example, every service mesh offers standard mutual transport layer security (mTLS), including certificate management, authentication, and authorization. However, Istio goes beyond this and can be configured to encrypt traffic across the mesh, on specific links, and/or only specific protocols over individual links.
When it comes to observability, Istio provides extensive metrics for latency, loads, and errors and supports easy integration with Prometheus and Grafana. For logs, it supports not only the printing of access logs from Envoy, but also the exporting of these logs to Open Telemetry collectors. It also supports distributed tracing with Jaeger, Zipkin, and Lightstep, or Open Telemetry collectors, via the OpenCensus Agent.
For traffic management, it supports request routing, fault injection, traffic shifting, mirroring/shadowing, locality load Balancing, canary deployments, and circuit breaking.
Istio was developed by Google, IBM, and Lyft, and was open sourced in 2017. It is regarded as one of the most mature service mesh technologies, as evidenced by the fact that it has been deployed by Adobe, Airbnb, Baidu, eBay, Google, Splunk, Salesforce, T-Mobile, and many other high-profile organizations.
There are some concerns, however, such as its higher marginal CPU overhead and memory impact. However, in neutral third-party performance testing analysis, Istio was shown not only to have the highest levels of overall service—especially as loads increased—but also that its CPU, memory, and latency impacts were found to be “not significant,” “acceptable,” and “very good,” respectively.
Again, it bears repeating that what works for one company may not for another, so it’s important to identify the business priorities driving the case for a service mesh and evaluate the best fit.
The takeaway
A service mesh is a network within your cloud-native application that provides you many business benefits, including managing encryption, observability, and optimizing your application availability and experience (such as by enabling advanced use cases, such as canary deployments and/or circuit breaking). Each service mesh has its own unique strengths and weaknesses, so it’s important to familiarize yourself, so you can best align your technology choice with business priorities and constraints. Cisco has chosen to build its solutions on Istio service mesh, as it is widely regarded as the most feature-rich and extensible service-mesh technology.