73,000 Pods a Day, Lessons From Misadventures In Multi-Tenant

Conference: KubeCon + CloudNativeCon North America 2022

2022-10-26

Authors: Shane Corbett, Wil Reed

Summary

Lessons learned from misadventures in running a large-scale multi-tenant Kubernetes cluster in production

Misapplying Kubernetes concepts to Linux performance rules is a big mistake
Thinking in cores can be dangerous, as Linux thinks in time
Configuring cores actually converts into time
Properly scaling on the right metric can greatly simplify cluster setup and reduce churn
Measuring what's going on is necessary to understand best practices for a cluster
Prometheus is a good tool for measuring cluster performance

The speaker and his colleague spent over two years learning about Linux kernel performance and developing custom monitoring dashboards to run a large-scale multi-tenant application in production. They discovered that some of the things they thought were best practices were actually holding them back the most. By focusing on the fundamentals and measuring what was going on, they were able to greatly simplify their cluster setup and reduce churn. They also found that Prometheus was a good tool for measuring cluster performance.

Abstract

We spent over two years pouring through 800 page linux kernel performance books, tweaking obscure control plane settings, and developing detailed custom monitoring dashboards so you don’t have to! We found there is a large delta between what we learned in CKA training, and the layer upon layer of hard fought knowledge it takes run a large scale multi-tenant application in production. Join us as we take you through real world findings that took months of research to fully understand, and provide evidence that some of the things we were convinced were best practices, were the very things holding us back the most.

Materials:

Tags:

73,000 Pods a Day, Lessons From Misadventures In Multi-Tenant

Conference: KubeCon + CloudNativeCon North America 2022

Authors: Shane Corbett, Wil Reed

Summary

Abstract

Post a comment

Related work