Power Level 9000! Improving Application Performance with Chaos Engineering

Conference: KubeCon + CloudNativeCon Europe 2021

Authors: Karthik Gaekwad, saiyam pathak

Summary

The presentation discusses the implementation of chaos engineering in cloud native environments using open source tools like Kraken, Litmus, and Chaos Mesh.

Chaos engineering involves injecting real-world events into a system to test its resilience
Hypotheses should be built around steady-state behavior and tested in production with minimal blast radius
Cloud native environments offer a rich ecosystem of open source tools for implementing chaos engineering
Kraken is an open source tool that injects failure into Kubernetes or OpenShift clusters using powerful seal and cerebrus components
Kraken can be used for scenarios like part chaos, node chaos, and time kiosk
Litmus and Chaos Mesh are other open source tools for implementing chaos engineering

Kraken is used to inject failures into Kubernetes clusters, such as killing an etcd pod or simulating a crashed node, to test the cluster's resilience. Cerebrus watches and reports on the changes in the infrastructure. Kraken can also work with different cloud APIs like AWS and Azure.

Abstract

Chaos engineering is becoming a standard to test the resiliency and performance of cloud native applications. It allows you to validate assumptions, catch loopholes, and generally improve resiliency in your cluster. This talk will focus on the specific experiments that will improve cluster and application performance and cover tools in the ecosystem, including Litmuschaos, Kraken, and Chaos Mesh. At the end of the talk, the audience will understand the basics of experiments, apply them in their orgs, and code to run in their infrastructures. Breakdown: The idea of an experiment Application Performance Ideas Identifying Cluster Latency Issues (k8s, k3s) Improving App SLO Fixing issues with app latency Detecting latency in Service Meshes Rightsizing cluster auto-scaling issues Ways we use Chaos Experiments in our companies Demo: Running a latency Chaos Experiment continuously in a CI pipeline on k3s Chaos Tooling review FAQs about Chaos Engineering Conclusions

Materials:

Slides

Tags: