Kubernetes clusters are critical infrastructure at large, public companies, with large amounts of traffic, complex dependencies on 3rd party services, and constant change as developers release features and traffic scales up and down. In this panel discussion, engineers from Airbnb, Lyft, Netflix and Robinhood share their challenges, experiences and learnings when it comes to managing a sustainable on-call rotation that meets the needs of their internal users whilst maintaining a high uptime to serve business critical workloads. Topics covered will include: +Keeping on-call engineers happy + Balancing rapid response with alert fatigue + Strategies to proactively deal with production issues + Preparing engineers for on-call

The Cloud Native Computing Foundation’s flagship conference gathers adopters and technologists from leading open source and cloud native communities in Detroit, Michigan from October 24 – 28, 2022. Join containerd, CoreDNS, Envoy, etcd, Fluentd, Harbor, Helm, Jaeger, Kubernetes, Linkerd, Open Policy Agent, Prometheus, Rook, TiKV, TUF, Vitess, Argo, Backstage, Buildpacks, Chaos Mesh, Cilium, CloudEvents, CNI, Contour, Cortex, CRI-O, Crossplane, CubeFS, dapr, Dragonfly, Emissary Ingress, Falco, Flagger, Flux, gRPC, Hubble, in-toto, KEDA, Keptn, Knative, KubeEdge, KubeVirt, Kyverno, Litmus, Longhorn, NATS, Notary, OpenMetrics, OpenTelemetry, Operator Framework, SPIFFE, SPIRE, Tetragon, Thanos, and Volcano as the community gathers for five days to further the education and advancement of cloud native computing.

Thriving With Kubernetes On-Call: Best Practices & Lessons Learned

Conference: KubeCon + CloudNativeCon North America 2022

Authors: Sunil Shah, Ramya Krishnan, Ashley Cutalo, Madhu C.S., Fabio Kung

Abstract

Post a comment

Related work