What We Learned from Reading 100+ Kubernetes Post-Mortems

Conference: KubeCon + CloudNativeCon North America 2021

2021-10-15

Authors: Shimon Tolts, Noaa Barki

Summary

Learning from 100+ Kubernetes post-mortems to prevent production outages

Reviewed 100+ post-mortems to discover recurring patterns, anti-patterns, and root causes of typical outages in Kubernetes-based systems
Aggregated insights gathered to review the most obvious DON'Ts and some less obvious ones to help prevent production outages
Shift left responsibility by delegating knowledge and educating developers on best practices in the industry
Anecdote about attending a devops meetup and realizing the importance of devops for developers

Attended a devops meetup and realized the importance of devops for developers. During a panel discussion, developers expressed frustration with using JFrog registry and security concerns. The speaker emphasized the need for developers to understand Kubernetes and its components, and for devops to delegate knowledge and educate developers on best practices. The speaker also highlighted the importance of shifting left responsibility and preventing devops from becoming a bottleneck.

Abstract

A smart person learns from their own mistakes, but a truly wise person learns from the mistakes of others. When launching our product, we wanted to learn as much as possible about typical pains in our ecosystem, and did so by reviewing many post-mortems (100+!) to discover the recurring patterns, anti-patterns, and root causes of typical outages in Kubernetes-based systems. In this talk we have aggregated for you the insights we gathered, and in particular will review the most obvious DON'Ts and some less obvious ones, that may help you prevent your next production outage by learning from others' real world (horror) stories.

Materials:

Tags:

What We Learned from Reading 100+ Kubernetes Post-Mortems

Conference: KubeCon + CloudNativeCon North America 2021

Authors: Shimon Tolts, Noaa Barki

Summary

Abstract

Post a comment

Related work