The presentation discusses challenges in running large Kubernetes clusters and offers best practices to overcome them. It also highlights the importance of using informers and avoiding list calls to improve performance.
- Running large Kubernetes clusters is challenging despite community improvements
- Defaults are not always enough and best practices should be followed
- Avoid list calls and use informers to improve performance
- Memory and CPU buffer should be maintained to handle bad events
- Streaming lists in Kubernetes 1.27 can improve memory usage
The presentation shares an incident where a naive approach to protect against accidental deletion of nodes in a node group resulted in hundreds of calls to etcd, causing performance issues. The issue was resolved by replacing list calls with informers.