Those of us who live inside the world of technology are used to the continuous wave of new terms and technologies, which very often are just noise and marketing. Fortunately, every now and then the waves throw some treasures. In this case, it has left Service Mesh on the shore.
Monolithic architectures have long been recognized as problem for improving the maintainability, scalability, and stability of applications, as well as facilitating the tasks of development teams. Over the past 20 years, there has been a move towards increased decoupling and autonomy of the components of an application. We have moved from SOA architectures, through message buses, to microservices.
In recent years, the rise of container technologies has accelerated this evolution. It has provided the development and operation ecosystem with infrastructures and tools that facilitate and somehow push the adoption of microservices as building blocks for applications. What used to be a block has now become dozens of small autonomous components (microservices), with very specific functionalities. Using a simple analogy, this design pattern is analogous to that used by the UNIX operating system from its inception. Instead of providing complex, heavyweight programs with dozens of functionalities, the environment is provided with small utilities that are highly specialised in performing a very specific and simple task. At the same time, it has a series of interconnection mechanisms that allow for combining them to solve more complex problems. This pattern of operating system has been successful for the past 50 years, which does not make for a bad example to keep in mind for guiding application architectures.
The spread of container platforms, with Kubernetes at the forefront, has filled data centres with thousands of containers. Applications now consist of tens or hundreds of containers, whose communication pattern with each other constitutes their nervous system, just like a living organism. This communication takes place over the existing network. In most cases, it happens through the network of a Kubernetes cluster. And this is where the first difficulties arise.
If in a traditional operation of a simple classic application, with its typical frontend and backend layers, traffic flows are well defined and easily traceable, imagine what happens in an application based on dozens of microservices, spread across several physical systems, or even in a hybrid infrastructure, which is partially stored in a data centre and in a public cloud. The number of communication flows geometrically skyrockets with the size of the application, and it becomes an unmanageable task to keep track of them. The thought of monitoring these flows or solving a functional or performance problem is nerve-wrecking.
Features that would greatly facilitate the life cycle of applications
Besides overcoming this challenge, it soon became clear that this model lacked a series of functionalities that would greatly facilitate the lifecycle of applications, such as:
- Load balancing: The ability to spread traffic across multiple instances of the same microservice.
- Intelligent routing: Making policy-based routing decisions based on time periods, the state of other services, traffic type or traffic consent, for example. This functionality is essential for adopting A/B, blue/green or canary deployment models.
- Service discovery: Considering the complexity that a microservice-based application can reach, it is very convenient to have a service discovery mechanism. This way, a microservice needing to communicate with another microservice knows where to find it.
- Resilience: Capacity to redirect traffic to a back-up service upon failure of the primary service.
- Observability: In the world of monolithic applications, the interactions between their components can be traced using debugging and profiling In the world of microservices, these interactions are highly complex, dynamic, network-level communication flows. It is convenient to be able to monitor and analyse these interactions to diagnose problems, optimise performance or forecast capacity, among other things.
- Security: Data between the different microservices should travel encrypted, and both ends should validate each other using digital certificates, as there is no control (from the application layer) of the networks over which the data travels. It would also be desirable to be able to manage permissions to prevent all unauthorized communication flows, thus considerably improving the security of the application.
It does not seem reasonable to ask development teams to implement these functionalities in their microservices, mainly because of the considerable time and cost increase. It makes more sense to create libraries that implement these functionalities, so that they can be embedded in applications. This was the first approach (Stubby de Google,Netflix’s Hysterix or Twitter’s Finagle), yet they soon realized that such libraries were very complex and costly to maintain. For example, one of the motivators to use microservices is that each of them can use the language that the development team in charge considers most appropriate, independently of the rest of the microservices. Such diversity of development environments must be transferred to these libraries, forcing their developers to translate the same functionalities to dozens of languages. On the other hand, when vulnerabilities are fixed, or a problem is solved, it is necessary to rebuild all the microservices, possibly in a new version, and a new deployment of the application.
Therefore, it made sense to separate these functionalities from the microservices themselves. And they should be agnostic to the implementation details of the microservices. This is achieved through the use of a proxy local to each microservice to manage incoming and outgoing communications. From the microservice’s point of view, its only interface to the world is this proxy, whether it has to accept connections or needs to communicate with another component of the application. This proxy also handles the tasks of balancing, traffic management, security, etc., transparently to the application. Using container technology, the implementation of these proxies is independent of the technology used in its associated microservice.
This network of proxies is the data plane of the application, which manages the communication between all its components. The configuration and supervision of this data plane is managed by the corresponding control plane. Both data and control planes allow for the establishment of a communication mesh, which we call service mesh. Some examples of implementations are Linkerd, Istio or Consul Connect.
Conceptually, the result is an overlay network on top of the existing network infrastructure. This type of network is born as a solution to satisfy functionalities that the underlying network lacks (underlay). Some examples of such networks are, for example:
- The Tor It was created to guarantee the anonymity of users, something that the Internet cannot do natively.
- VPNs, developed to provide security in the form of encrypted communications and peer-to-peer authentication.
- The Kubernetes CNI It provides a flat network between containers regardless of the physical servers that make up a cluster, such as Weave, Flannel, or Calico.
Generally, the appearance of an overlay is of great concern to communications and security managers in organisations, as it is beyond their control. For instance, an overlay network could interconnect services that in the underlay would be isolated by security policies. It is also common that, over time, some of the functionalities that have motivated the creation of the overlay end up being implemented more efficiently in the underlay. This is the case of what happened with the overlays of Kubernetes and SDNs, such as Cisco ACI.
The questions that many organisations ask is: should I incorporate a service mesh into my environment and adapt my developments to make use of it? There is no easy answer. The benefits are obvious, but there are some drawbacks to consider when making the decision:
- Immaturity: the technology to implement service mesh is relatively new, and some implementations are still a few hours old.
- Team’s training: the learning curve for both development and operation teams is quite steep.
In most cases, a hybrid environment will be the best approach, with coexisting applications that can take advantage of the service mesh and more traditional applications that are not worth migrating to the new scheme. Over time, the ratio of applications to service mesh will gradually increase.
In the coming years, we will see all these disadvantages being left behind and the service mesh will become an essential element in the application architecture.
If you want more information about Satec, do not hesitate to get in touch here 👈