logo

Migrating From Single-Node Kubernetes Control Plane To HA In Production

2022-10-26

Authors:   Cong Yue, David Oppenheimer


Summary

The presentation discusses the three-phase migration process for moving from a non-HA control plane to an HA control plane in Kubernetes, with a focus on protecting the cluster state and ensuring minimal impact on workloads.
  • The migration process is divided into three phases, with multiple steps in each phase to ensure the cluster state is protected and workloads are not impacted
  • The first phase involves getting the cluster state from the non-HA control plane and building a snapshot for use in the second phase
  • The second phase involves migrating the cluster state to the HA control plane, with traffic to the control plane shut down to prevent class data mutation
  • The third phase involves confirming that everything is working properly and reopening traffic to the control plane
  • The main focus throughout the migration process is on protecting the cluster state and ensuring minimal impact on workloads
During an outage caused by inconsistent class state during migration, the team learned the importance of keeping the cluster state consistent throughout the process to prevent downtime and ensure workloads are not affected.

Abstract

Databricks adopted Kubernetes in 2016, before highly available (HA) Kubernetes control plane deployments were common. As a result, we built our self-managed Kubernetes clusters using a single-node control plane in AWS, and then later also on Azure and GCP. Recently we migrated these production clusters to use a multi-node control plane, which provides higher reliability and enables us to upgrade Kubernetes versions more safely and therefore faster across the fleet. In this talk we describe the architecture we chose for our HA control plane, how we safely migrated a fleet of clusters from a single-node control plane to HA without affecting workloads in production, and how we adapted some of our Day 2 operations to accomodate multi-node control plane.

Materials: