The presentation discusses the implementation of chaos engineering in cloud native environments using open source tools like Kraken, Litmus, and Chaos Mesh.
- Chaos engineering involves injecting real-world events into a system to test its resilience
- Hypotheses should be built around steady-state behavior and tested in production with minimal blast radius
- Cloud native environments offer a rich ecosystem of open source tools for implementing chaos engineering
- Kraken is an open source tool that injects failure into Kubernetes or OpenShift clusters using powerful seal and cerebrus components
- Kraken can be used for scenarios like part chaos, node chaos, and time kiosk
- Litmus and Chaos Mesh are other open source tools for implementing chaos engineering
Kraken is used to inject failures into Kubernetes clusters, such as killing an etcd pod or simulating a crashed node, to test the cluster's resilience. Cerebrus watches and reports on the changes in the infrastructure. Kraken can also work with different cloud APIs like AWS and Azure.
Chaos engineering is becoming a standard to test the resiliency and performance of cloud native applications. It allows you to validate assumptions, catch loopholes, and generally improve resiliency in your cluster. This talk will focus on the specific experiments that will improve cluster and application performance and cover tools in the ecosystem, including Litmuschaos, Kraken, and Chaos Mesh. At the end of the talk, the audience will understand the basics of experiments, apply them in their orgs, and code to run in their infrastructures. Breakdown: The idea of an experiment Application Performance Ideas Identifying Cluster Latency Issues (k8s, k3s) Improving App SLO Fixing issues with app latency Detecting latency in Service Meshes Rightsizing cluster auto-scaling issues Ways we use Chaos Experiments in our companies Demo: Running a latency Chaos Experiment continuously in a CI pipeline on k3s Chaos Tooling review FAQs about Chaos Engineering Conclusions