logo

Thriving With Kubernetes On-Call: Best Practices & Lessons Learned

2022-10-27

Authors:   Sunil Shah, Ramya Krishnan, Ashley Cutalo, Madhu C.S., Fabio Kung


Abstract

Kubernetes clusters are critical infrastructure at large, public companies, with large amounts of traffic, complex dependencies on 3rd party services, and constant change as developers release features and traffic scales up and down. In this panel discussion, engineers from Airbnb, Lyft, Netflix and Robinhood share their challenges, experiences and learnings when it comes to managing a sustainable on-call rotation that meets the needs of their internal users whilst maintaining a high uptime to serve business critical workloads. Topics covered will include: +Keeping on-call engineers happy + Balancing rapid response with alert fatigue + Strategies to proactively deal with production issues + Preparing engineers for on-call

Materials:

Post a comment

Related work