Preventing Controller Sprawl From Taking Down Your Cluster - When a Scalable Pattern Stops Being Scalable

Conference: KubeCon + CloudNativeCon North America 2022

2022-10-28

Authors: Madhu C.S.

Summary

Best practices for managing Kubernetes clusters and extensions

Construct dashboards to make important metrics visible and accessible
Train teams to understand logs and use them for debugging
Have visibility into changes made to the system
Work closely with partner teams for writing extensions
Read the code to detect bugs and understand the system

The speaker shared their experience of using audit logs as a powerful debugging tool and finding tons of requests made by the cubeless during a case study

Abstract

The vast majority of Kubernetes controllers make use of a WATCH and UPDATE pattern, which is a highly scalable client-pull based pattern. “Highly” does not mean “infinite”, and the spread of this pattern has led to a number of implicit design guarantees that operators build on. In this talk, the Container Orchestration team at Robinhood will cover the exploration of the boundaries of this pattern, how second order effects result in service degradation in production, and best practices for monitoring, detecting, debugging and addressing these issues. With examples drawn from real outages, the team will present lessons learned for organizations of all sizes.

Materials:

Tags:

Preventing Controller Sprawl From Taking Down Your Cluster - When a Scalable Pattern Stops Being Scalable

Conference: KubeCon + CloudNativeCon North America 2022

Authors: Madhu C.S.

Summary

Abstract

Post a comment

Related work