Archetypes for Reliable Systems

Conference: KubeCon + CloudNativeCon Europe 2023

2023-04-20

Authors: Ameer Abbas, Steve McGhee

Summary

The presentation discusses the importance of starting with archetypes when building resilient platforms and services, and the trade-offs between reliability and effort.

Archetypes provide known good starting points for building resilient platforms and services
Applications have multiple services and microservices should be used to degrade gracefully
Resilient teams are necessary to build robust platforms that can handle risks
There are trade-offs between reliability and effort, and exponential curves show the increasing effort required for higher levels of reliability

The presentation uses a graph to illustrate the trade-offs between reliability and effort, showing that the effort required for higher levels of reliability increases exponentially. The speaker emphasizes the importance of considering the effort required when aiming for higher levels of reliability, and suggests starting with archetypes as a good place to begin building resilient platforms and services.

Abstract

We present a model and implementation for designing and running cloud-based internet services at various levels of intended reliability, based on "Deployment Archetypes for Cloud Applications" [Berenberg, Calder, 2022] https://dl.acm.org/doi/full/10.1145/3498336# This model allows cloud customers to describe the reliability needs (availability, failure domain resilience, RTO/RPO) of an application and then provides a kubernetes-based deployment strategy that implements that archetype. Our implementation provides a multi-tenant, multi-application, multi-cluster strategy, with CI/CD, micro-segmentation, policy management, traffic routing, SLOs and application + infrastructure monitoring. This allows for application teams to own their services, while allowing infrastructure teams to perform updates without service interruption.

Materials:

Tags: