Traffic Shadowing With Istio: Reducing the Risk of Code Release

We’ve been talking about Istio and service mesh recently (follow along @christianposta for the latest) but one aspect of Istio can be glossed over. One of the most important aspects of Istio.io is its ability to control the routing of traffic between services. With this fine-grained control of application-level traffic, we can do interesting resilience things like routing around failures, routing to different availability zones when necessary etc. IMHO, more importantly, we can also control the flow of traffic for our deployments so we can reduce the risk of change to the system.

With a services architecture, our goal is to increase our ability to go faster so we do things like implement microservices, automated testing pipelines, CI/CD etc. But what good is any of this if we have bottlenecks getting our code changes into production? Production is where we understand whether our changes have any positive impact to our KPIs, so we should reduce the bottlenecks of getting code into production.

At the typical enterprise customers that I visit regularly (Financial services, Insurance, Retail, Energy, etc) risk is such a big part of the equation. Risk is used as a reason for why changes to production get blocked. A big part of this risk is a code “deployment” is all or nothing in these environments. What I mean is there is no separation of deployment and release. This is such a hugely important distinction.

Deployment vs Release

A deployment brings new code to production but it takes no production traffic. Once in the production environment, service teams are free to run smoke tests, integration tests, etc without impacting any users. A service team should feel free to deploy as frequently as it wishes.

A release brings live traffic to a deployment but may require signoff from “the business stakeholders”. Ideally, bringing traffic to a deployment can be done in a controlled manner to reduce risk. For example, we may want to bring internal-user traffic to the deployment first. Or we may want to bring a small fraction, say 1%, of traffic to the deployment. If any of these release rollout strategies (internal, non-paying, 1% traffic, etc) exhibit undesirable behavior (thus the need for strong observability) then we can rollback.

Please go read the two-part series titled “Deploy != Release” from the good folks at Turbine.io labs for a deeper treatment of this topic.

Dark traffic

One strategy we can use to reduce risk for our releases, before we even expose to any type of user, is to shadow traffic live traffic to our deployment. With traffic shadowing, we can take a fraction of traffic and route it to our new deployment and observe how it behaves. We can do things like test for errors, exceptions, performance, and result parity. Projects such as Twitter Diffy can be used to do comparisons between different released versions and unreleased versions.

With Istio, we can do this kind of traffic control by Mirroring traffic from one service to another. Let’s take a look at an example.

Traffic Mirroring with Istio

With the Istio 0.5.0 release we have the ability to mirror traffic from one service to another, or from one version to a newer version.

We’ll start by creating two deployments of an httpbin service.

$  cat httpbin-v1.yaml

We’ll inject the istio sidecar with kube-inject like this:

$  kubectl create -f <(istioctl kube-inject -f httpbin-v1.yaml)

Version 2 of the httpbin service is similar except it has labels that denote that it’s version 2:

$  cat httpbin-v2.yaml

Let’s deploy httpbin-v2 also:

$  kubectl create -f <(istioctl kube-inject -f httpbin-v2.yaml)

Lastly, let’s deploy the sleep demo from Istio samples so we can easily call into our httpbin service:

$  kubectl create -f <(istioctl kube-inject -f sleep.yaml)

You should see three pods like this:

$  kubectl get pod
NAME                          READY     STATUS    RESTARTS   AGE
httpbin-v1-2113278084-98whj   2/2       Running   0          1d
httpbin-v2-2839546783-2dvhq   2/2       Running   0          1d
sleep-1512692991-txrfn        2/2       Running   0          1d

If we start sending traffic to the httpbin service, we’ll see the default Kubernetes behavior to load balance across both v1 and v2 since both pods will match the selector for the httpbin Kubernetes Service. Let’s take a look at the default Istio route rule to route all traffic to v1 of our service:

Let’s create this routerule:

$  istioctl create -f routerules/all-httpbin-v1.yaml

If we start sending traffic into our httpbin service, we should only see traffic for the httpbin-v1 deployment:

export SLEEP_POD=$(kubectl get pod -l app=sleep -o jsonpath={.items..metadata.name})
kubectl exec -it $SLEEP_POD -c sleep -- sh -c 'curl  http://httpbin:8080/headers'

{
  "headers": {
    "Accept": "*/*", 
    "Content-Length": "0", 
    "Host": "httpbin:8080", 
    "User-Agent": "curl/7.35.0", 
    "X-B3-Sampled": "1", 
    "X-B3-Spanid": "eca3d7ed8f2e6a0a", 
    "X-B3-Traceid": "eca3d7ed8f2e6a0a", 
    "X-Ot-Span-Context": "eca3d7ed8f2e6a0a;eca3d7ed8f2e6a0a;0000000000000000"
  }
}

If we check the access logs for the httpbin-v1 service, we should see a single access-log statement:

$  kubectl logs -f httpbin-v1-2113278084-98whj -c httpbin 
127.0.0.1 - - [07/Feb/2018:00:07:39 +0000] "GET /headers HTTP/1.1" 200 349 "-" "curl/7.35.0"

If we check the logs for the httpbin-v2 service, we should see NO access log statements.

Let’s mirror traffic from v1 to v2. Here’s the Istio route rule we’ll use:

A few things to note:

We are explicitly telling Istio to weight the traffic between v1 (100%) and v2 (0%)
We are using labels to specify which version of httpbin service to which we want to mirror

Let’s create this routerule

$  istioctl create -f routerules/mirror/mirror-traffic-to-httbin-v2.yaml

We should see routerules like this:

$  istioctl get routerules

$  istioctl get routerules
NAME                    KIND                                    NAMESPACE
httpbin-default-v1      RouteRule.v1alpha2.config.istio.io      tutorial
httpbin-mirror-v2       RouteRule.v1alpha2.config.istio.io      tutorial

Now if we start sending traffic in, we should see requests go to v1 and requests shadowed to v2.

Video demo

Here’s a video showing this:

Istio Mirroring Demo from Christian Posta on Vimeo.

Please see the offical istio docs for more details!

Christian Posta

Traffic Shadowing With Istio: Reducing the Risk of Code Release

Deployment vs Release

Dark traffic

Traffic Mirroring with Istio

Video demo

You might also enjoy (View all posts)