logo

Scale Kubernetes to Manage 60K Nodes Through Architectural Extension

Conference:  CloudOpen 2022

2022-06-21

Authors:   Ying Xiong, Ying Huang


Summary

The Centaurus project aims to build a cloud infrastructure platform for managing large-scale computer nodes through Kubernetes API. The project addresses challenges in managing clusters with over 10,000 computer nodes and provisioning over 5,000 VMs within minutes. The project also aims to unify platforms for managing VMs, containers, and serverless applications, and to provide a multi-tenant Kubernetes with isolation for customers. The project is driven by customer requests for deploying hybrid applications with a single API and for managing AI training on both cloud and edge nodes.
  • The Centaurus project aims to manage large-scale computer nodes through Kubernetes API
  • The project addresses challenges in managing clusters with over 10,000 computer nodes and provisioning over 5,000 VMs within minutes
  • The project aims to unify platforms for managing VMs, containers, and serverless applications
  • The project aims to provide a multi-tenant Kubernetes with isolation for customers
  • The project is driven by customer requests for deploying hybrid applications with a single API and for managing AI training on both cloud and edge nodes
The project was initiated due to scalability issues in managing large clusters and provisioning VMs within minutes. The project aims to provide a unified platform for managing different types of applications and to address customer requests for deploying hybrid applications with a single API and managing AI training on both cloud and edge nodes.

Abstract

With the increasing adoption of Kubernetes in public and private clouds, large enterprises are looking for solutions that scale a single cluster to tens of thousands of nodes, primarily for simplified operations. In this talk, they will present a mechanism that extends Kubernetes architecture to manage a cluster of 60K nodes. This architectural extension shards Kubernetes cluster into two partitions. One partition, called Tenant Partition, manages customer related objects such as deployments, pods, services, endpoints, etc. The other partition, called Resource Partition, manages non-customer objects such as nodes. They will also present and analyze the performance test results, compare the solution with Kubernetes community version in multiple dimensions. Future works that extend the architecture to manage even larger Kubernetes clusters will also be discussed.

Materials:

Post a comment

Related work

Authors: Arun M. Krishnakumar, Sahithi Ayloo
2023-04-19