logo

Multi-Cluster Stateful Set Migration: A Solution To Upgrade Pain

2022-10-26

Authors:   Matt Schallert, Peter Schuurman


Summary

The presentation discusses the challenges and solutions for cross-cluster migration of stateful workloads in Kubernetes using multi-cluster services and staple set slices.
  • Multi-cluster services provide a way to uniquely identify replicas across clusters and discover endpoints.
  • Staple set slices allow for granular control over replica ordinal and scaling in complementary fashion.
  • Coordinating building blocks and migrating dependencies are necessary for cross-cluster migration.
  • Challenges of cross-cluster migration include managing multiple Kubernetes control planes and client-side Quorum.
  • Stateful workloads require unique solutions for cross-cluster migration.
  • An anecdote is provided about the complexity of migrating a metrics datastore between Kubernetes clusters.
The presenter shares a story about the complexity of migrating a metrics datastore between Kubernetes clusters, which required finding alternatives to deleting stateful sets and manually moving nodes between clusters. The challenges of cross-cluster migration for stateful workloads include managing multiple Kubernetes control planes and the requirement of client-side Quorum.

Abstract

As more stateful workloads like Redis, Kafka, or custom DBs are migrated to Kubernetes, what operational paradigms need to change to support moving state across clusters and maintaining availability during migration? How do admins safely and reliably perform Day 2 operations and maintenance events while protecting the data and state of the app? What visibility is needed? Today, cluster administrators design complex workflows for data replication, pod and persistent volume migration, and state management for Day 2 ops. What if there was a way to seamlessly migrate StatefulSets between node pools or across clusters to simplify problems related to upgrades, workload migration, and stretching clusters? The speakers will demonstrate the complex patterns developed at Chronosphere to safely migrate stateful workloads to coordinate maintenance operations for thousands of pods across multiple zones and regions. They will then discuss a new enhancement to Kubernetes called StatefulSet Partition which is integrated into a multi-cluster deployment like Chronosphere's and how this can dramatically simplify their operations to focus instead on core business logic.

Materials:

Post a comment

Related work




Authors: Ricardo Rocha, Spyros Trigazis
2023-04-20

Authors: Jim Bugwadia, Jayashree Ramanathan, Anca Sailer, Robert Ficcaglia
2022-10-27