logo
Dates

Author


Conferences

Tags

Sort by:  

Authors: Shuo Chen
2023-04-19

tldr - powered by Generative AI

Databricks uses Kata Containers for hard multi-tenancy in Kubernetes clusters to provide strong isolation for performance-sensitive workloads such as Data Lakehouse. The case study discusses the challenges faced, trade-offs among security, performance, and cost, and how to work around the heterogeneity across different public cloud providers.
  • Databricks is building a serverless platform for performance-sensitive workloads such as Data Lakehouse on Kubernetes clusters
  • They need hard multi-tenant container isolation since each cluster runs code on behalf of multiple customers
  • They chose Kata Containers, an open-source container runtime that provides strong isolation by running containers in micro-VMs
  • They built a hard compute and network isolation layer among untrusted workloads in Kubernetes clusters leveraging Kata Containers, network policy, and network security group
  • They share their first-hand experience on how they integrate Kata Containers with Kubernetes in production, highlighting the challenges they faced, difficult trade-offs among security, performance, and cost, and how to work around the heterogeneity across different public cloud providers
Authors: Cong Yue, David Oppenheimer
2022-10-26

tldr - powered by Generative AI

The presentation discusses the three-phase migration process for moving from a non-HA control plane to an HA control plane in Kubernetes, with a focus on protecting the cluster state and ensuring minimal impact on workloads.
  • The migration process is divided into three phases, with multiple steps in each phase to ensure the cluster state is protected and workloads are not impacted
  • The first phase involves getting the cluster state from the non-HA control plane and building a snapshot for use in the second phase
  • The second phase involves migrating the cluster state to the HA control plane, with traffic to the control plane shut down to prevent class data mutation
  • The third phase involves confirming that everything is working properly and reopening traffic to the control plane
  • The main focus throughout the migration process is on protecting the cluster state and ensuring minimal impact on workloads