Better Reliability Through Observability and Experimentation


Authors:   Kerim Satirli, Julie Gunderson


Site Reliability Engineering (SRE) treats reliability as a software problem, but it really is an organizational problem that requires a different mindset. When the reliability of our service drops, so does our ability to create value for the organization we represent. In this talk, Julie and Kerim will take the audience on a guided journey, starting with how to determine if and how workloads are misbehaving and ending with practical approaches to improve reliability. Through simulated outages (of all types!), observability, and analysis, Julie and Kerim will show attendees how to catch and prepare for service disruptions. Going beyond deployments, attendees will also learn how to combine OpenTelemetry and OpenTracing to instill reliability into their systems.Click here to view captioning/translation in the MeetingPlay platform!


Post a comment