logo

Efficient Access to Shared GPU Resources: Mechanisms and Use Cases

2023-04-19

Authors:   Diogo Guerra, Diana Gaponcic


Summary

The presentation discusses GPU utilization and benchmarking, focusing on time slicing and Mig, and provides insights on their use cases and performance trade-offs.
  • Time slicing is useful for low priority jobs with idle time, but not suitable for latency-sensitive or performance-intensive tasks.
  • Mig enables GPU sharing but comes with a performance loss due to the reduction in streaming multiprocessors.
  • Benchmarking shows that time slicing incurs a significant performance loss when contact switching is required for long-running processes.
  • Doubling memory and bandwidth through Mig can improve performance, but losing Mig without sharing the GPU results in a performance loss for no reason.
  • Monitoring pipeline utilization can help understand user jobs and optimize GPU usage.
The presenter uses the movie Reservoir Dogs as an inspiration for the presentation and thanks colleagues for their support in the benchmarking journey.

Abstract

GPUs and accelerators are changing traditional High Energy Physics (HEP) deployments while also being the key to enable efficient machine learning. GPU scheduling in Kubernetes has been limited until now. Not being able to easily share access to single GPUs by multiple workloads leads to inefficiencies when those are light or spiky. At the same time these resources are scarce, expensive and in high demand. In this talk we explore the different possibilities to improve overall usage of GPU resources. We explore the multiple options for GPU scheduling, time sharing and the recently introduced Nvidia Multi-Instance-GPU (MIG) for physical partitioning. We cover the features and limitations of each option and present extensive benchmark results that helped us assign each workload to the most appropriate layout. Finally we describe how we manage GPUs in a centralized way, ensuring optimal resource utilization for services like continuous integration, machine learning and batch.

Materials: