KubeCon + CloudNativeCon Europe 2023

GPUs and accelerators are changing traditional High Energy Physics (HEP) deployments while also being the key to enable efficient machine learning. GPU scheduling in Kubernetes has been limited until now. Not being able to easily share access to single GPUs by multiple workloads leads to inefficiencies when those are light or spiky. At the same time these resources are scarce, expensive and in high demand. In this talk we explore the different possibilities to improve overall usage of GPU resources. We explore the multiple options for GPU scheduling, time sharing and the recently introduced Nvidia Multi-Instance-GPU (MIG) for physical partitioning. We cover the features and limitations of each option and present extensive benchmark results that helped us assign each workload to the most appropriate layout. Finally we describe how we manage GPUs in a centralized way, ensuring optimal resource utilization for services like continuous integration, machine learning and batch.

Dates

Author

Conferences

Tags

Efficient Access to Shared GPU Resources: Mechanisms and Use Cases

tldr - powered by Generative AI