Live Experiments with K8s Applications: Pitfalls and How to Avoid Them

Conference: KubeCon + CloudNativeCon Europe 2021

Authors: Fabio Oliveira, Srinivasan Parthasarathy

Summary

The presentation discusses the importance of live experimentation of Kubernetes applications and the principles of trust in release automation. It emphasizes the need for accuracy and repeatability in data-driven solutions and the consideration of both success criteria and business rewards metrics in delivering business results.

Live experimentation of Kubernetes applications can bring value to businesses
Common practices such as canary release, A/B testing, conformance test, dark launches, etc. can be framed as live experimentation
Fully automated solutions to code release must be data-driven, accurate, and repeatable
A solution needs to have statistical rigor, distinguish noise from actual code behavior, and adjust traffic split based on statistically correct version assessments
Success criteria and business rewards metrics should be considered in delivering business results
A/B and ABN experiments should incorporate service level objectives and progressively shift traffic towards the winning version
Sophisticated algorithms are needed for comparing and assessing all versions and deciding when a winner can be declared with statistical confidence

In order to deliver business results through code release, it is important to consider both success criteria and business rewards metrics. For example, a company may release a new version of their app with the goal of increasing user engagement and ultimately revenue. By tracking metrics such as conversion rate and mean latency, they can determine which version is most successful in achieving these goals. However, it is also important to ensure accuracy and repeatability in the data-driven solution used to make these assessments, as well as to distinguish noise from actual code behavior and adjust traffic split accordingly.

Abstract

Your K8s apps are instrumented for observability. You are using ingress controllers/service meshes in your production K8s cluster and can shift traffic between different versions of your app. You wish to take your CI/CD to the next level by introducing metrics-driven automated rollouts using live experiments like canary, A/B, and A/B/n comparisons. What could go wrong? We demonstrate how subtle differences in the design of the experiment---how metrics are collected, queried, and used; the traffic shifting policy; the number of requests sent to different versions during the experiment and its duration; and when/how it is terminated---can lead to dramatically different outcomes, and in turn, directly impact the version of the app chosen to run in production. We also discuss simple and statistically effective remedies for the above problem, so that experiments become repeatable and their outcomes are more accurate and trustworthy.

Materials:

Slides

Tags: