The presentation discusses the challenges of testing Kubernetes controllers in the presence of distributed systems faults and introduces an automated testing tool called Sieve to address these challenges.
- Kubernetes controllers are critical for extending Kubernetes with new capabilities
- Controllers are just one component in a complex distributed system and are susceptible to various kinds of faults
- It is difficult to make controller code robust to these faults
- Sieve is an automated testing tool that systematically tests Kubernetes controllers to harden them against faults
- Sieve has already discovered and led to fixes for safety-critical bugs in popular Kubernetes controllers
The presentation uses the example of a controller managing a new application on Kubernetes to illustrate the challenges of testing controllers in a distributed system. The controller must reconcile the desired state and current state of the application, while also interacting with other built-in and third-party controllers. The consequences of mistakes made by the controller in the presence of faults can be severe, such as accidentally deleting volumes or stateful sets. Sieve is introduced as a tool to help controller developers test their code against these scenarios.
The Kubernetes ecosystem has thousands of controller implementations for different applications and platform capabilities. A controller’s correctness is therefore critical, and yet, can be compromised by myriad factors, such as asynchrony in the overall distributed system, unexpected failures, networking issues, and controller restarts. This in turn can lead to severe safety violations, such as incorrectly deleting StatefulSets and PVCs. Controller-developers unfortunately lack automated testing tools to harden their code against these conditions. In this talk, Xudong Sun and Lalith Suresh will describe common bug patterns in Kubernetes controllers. They will also present an automated testing tool called Sieve, which systematically tests Kubernetes controllers to harden them against the aforementioned scenarios. Sieve has already discovered (and led to fixes for) several safety-critical bugs in popular Kubernetes controllers for Zookeeper, Cassandra, RabbitMQ, MongoDB, XtraDB, etc.