logo

Node Resource Management: The Big Picture

2023-04-19

Authors:   Alexander Kanevskiy, Swati Sehgal, David Porter, Sascha Grunert, Evan Lezar


Summary

The presentation discusses the importance of resource management in Kubernetes and highlights new features and enhancements in the ecosystem, such as the Container Device Interface (CDI) and Cgroups V2.
  • The CDI allows for sharing of GPUs and devices across different containers and pods, as well as dynamic partitioning and mixing and matching of devices.
  • Topology-aware scheduling is not the only use case for Node Resource Information (NRI) plugins, and top-level attributes can be used for other capabilities as well.
  • Cgroups V2 provides new resource management capabilities, such as memory QoS and PSI metrics, and there are plans to explore i/o isolation and network QoS guarantees.
  • The speaker encourages feedback from the audience on resource management challenges and desired features.
The speaker mentions that memory QoS is a new feature in Kubernetes 1.27 that provides minimum guarantees for memory usage and can prevent out of memory situations. This is important for applications that need a minimum amount of memory to function properly.

Abstract

Resource management is a fundamental area in Kubernetes that focuses on how to properly reserve, allocate, and isolate finite resources on nodes such as CPU, memory, disk, network, accelerators, etc. Resource Management is a hot topic, with multiple proposals raised recently on how to improve things both in Kubernetes and container runtimes: Dynamic Resource Allocation, QoS class resources, improvements to CPU Management, to container lifecycle management and statistics, support in CRI-enabled container runtimes for advanced low-level runtimes such as Kata containers, Firecracker, gVisor, and Confidential Containers and many more. In this presentation, speakers will present the “big picture” for these proposals, how they are interconnected, how they are different, which problems they are targeting to solve, and what they mean for Kubernetes users. This presentation will be helpful for cluster administrators and users to understand the future direction in their resource management area and give a framework for them to provide feedback that can help shape these future efforts. We will also describe opportunities for folks who are more interested to get involved with the open source SIG-Node and runtime communities to drive these efforts forward.

Materials:

Post a comment

Related work


Authors: David Porter, Mrunal Patel
2022-10-28

Authors: Dawn Chen, Sergey Kanzhelev, Mrunal Patel, Derek Carr
2023-04-21


Authors: Dawn Chen, Derek Carr, Elana Hashman, Sergey Kanzhelev
2021-10-15

Authors: Elana Hashman, Sergey Kanzhelev