We’ve been talking about Istio and service mesh recently (follow along @christianposta for the latest) but one aspect of Istio can be glossed over. One of the most important aspects of Istio.io is its ability to control the routing of traffic between services. With this fine-grained control of application-level traffic, we can do interesting resilience things like routing around failures, routing to different availability zones when necessary etc. IMHO, more importantly, we can also control the flow of traffic for our deployments so we can reduce the risk of change to the system.
With a services architecture, our goal is to increase our ability to go faster so we do things like implement microservices, automated testing pipelines, CI/CD etc. But what good is any of this if we have bottlenecks getting our code changes into production? Production is where we understand whether our changes have any positive impact to our KPIs, so we should reduce the bottlenecks of getting code into production.
At the typical enterprise customers that I visit regularly (Financial services, Insurance, Retail, Energy, etc) risk is such a big part of the equation. Risk is used as a reason for why changes to production get blocked. A big part of this risk is a code “deployment” is all or nothing in these environments. What I mean is there is no separation of deployment and release. This is such a hugely important distinction.
Deployment vs Release
A deployment brings new code to production but it takes no production traffic. Once in the production environment, service teams are free to run smoke tests, integration tests, etc without impacting any users. A service team should feel free to deploy as frequently as it wishes.
A release brings live traffic to a deployment but may require signoff from “the business stakeholders”. Ideally, bringing traffic to a deployment can be done in a controlled manner to reduce risk. For example, we may want to bring internal-user traffic to the deployment first. Or we may want to bring a small fraction, say 1%, of traffic to the deployment. If any of these release rollout strategies (internal, non-paying, 1% traffic, etc) exhibit undesirable behavior (thus the need for strong observability) then we can rollback.
One strategy we can use to reduce risk for our releases, before we even expose to any type of user, is to shadow traffic live traffic to our deployment. With traffic shadowing, we can take a fraction of traffic and route it to our new deployment and observe how it behaves. We can do things like test for errors, exceptions, performance, and result parity. Projects such as Twitter Diffy can be used to do comparisons between different released versions and unreleased versions.
With Istio, we can do this kind of traffic control by Mirroring traffic from one service to another. Let’s take a look at an example.
Traffic Mirroring with Istio
With the Istio 0.5.0 release we have the ability to mirror traffic from one service to another, or from one version to a newer version.
We’ll start by creating two deployments of an httpbin service.
We’ll inject the istio sidecar with
kube-inject like this:
Version 2 of the
httpbin service is similar except it has labels that denote that it’s version 2:
Let’s deploy httpbin-v2 also:
Lastly, let’s deploy the
sleep demo from Istio samples so we can easily call into our
You should see three pods like this:
If we start sending traffic to the
httpbin service, we’ll see the default Kubernetes behavior to load balance across both
v2 since both pods will match the selector for the
httpbin Kubernetes Service. Let’s take a look at the default Istio route rule to route all traffic to
v1 of our service:
Let’s create this
If we start sending traffic into our
httpbin service, we should only see traffic for the
If we check the access logs for the
httpbin-v1 service, we should see a single access-log statement:
If we check the logs for the
httpbin-v2 service, we should see NO access log statements.
Let’s mirror traffic from
v2. Here’s the Istio route rule we’ll use:
A few things to note:
- We are explicitly telling Istio to weight the traffic between v1 (100%) and v2 (0%)
- We are using labels to specify which version of httpbin service to which we want to mirror
Let’s create this
We should see routerules like this:
Now if we start sending traffic in, we should see requests go to
v1 and requests shadowed to
Here’s a video showing this:
Please see the offical istio docs for more details!