logo

How to Blow up a Kubernetes Cluster

2023-04-19

Authors:   Felix Hoffmann


Summary

The presentation discusses resource management in Kubernetes from the perspective of an application developer, highlighting the importance of setting resource requests and limits appropriately to avoid cluster crashes and scheduling issues.
  • Resource management in Kubernetes involves setting CPU and memory requests and limits for containers
  • Memory limits result in termination of pods when exceeded, while CPU limits can lead to throttling or termination
  • Setting appropriate requests and limits is crucial for efficient scheduling and avoiding noisy neighbors
  • Developers should be aware of namespace limits and available resources when setting requests and limits
  • In general, it is advisable to set memory requests equal to memory limits and avoid setting CPU limits
  • Exceptions include cases where consistent workloads or overcommitment of memory are preferred
The speaker, an application developer, was tasked with setting resource limits for a Kubernetes cluster without prior knowledge of the system. Despite reading the documentation, the speaker set inappropriate limits which caused the entire cluster to crash. This experience highlights the importance of understanding how Kubernetes handles scarce resources and setting appropriate requests and limits to avoid similar issues.

Abstract

Last year, Felix was handed a Kubernetes cluster and he was told that some pods are using too much memory. He didn't have a single clue about Kubernetes but quickly figured out that pods can be tamed by setting resource limits. Felix went and set limits—and watched the entire cluster go haywire. Half of the pods were stuck in a crash loop, the other half were forever "pending". On first sight, resource request and limit seem straightforward: A request is a lower bound for CPU or memory; a limit is an upper bound for CPU or memory. Once demand becomes higher than supply though, it is imperative to know how Kubernetes handles scarce resources. How do these settings influence scheduling? Which pod gets terminated first? Felix learned these things the hard way. He is giving this talk so you don't have to repeat his mistakes.

Materials:

Post a comment