logo
Dates

Author


Conferences

Tags

Sort by:  

Authors: Kevin Klues, Alexey Fomenko
2023-04-19

tldr - powered by Generative AI

Overview of building a DRA resource driver for Kubernetes
  • A DRA resource driver consists of a centralized controller and a node-local plugin
  • Communication between the two components can be done through a single all-purpose CRD
  • The controller makes allocation decisions and the plugin advertises available resources
  • The driver needs to define a name, communication strategy, resource types, class parameters, and API access
  • Helper libraries are available to make implementation easier
Authors: Diogo Guerra, Diana Gaponcic
2023-04-19

tldr - powered by Generative AI

The presentation discusses GPU utilization and benchmarking, focusing on time slicing and Mig, and provides insights on their use cases and performance trade-offs.
  • Time slicing is useful for low priority jobs with idle time, but not suitable for latency-sensitive or performance-intensive tasks.
  • Mig enables GPU sharing but comes with a performance loss due to the reduction in streaming multiprocessors.
  • Benchmarking shows that time slicing incurs a significant performance loss when contact switching is required for long-running processes.
  • Doubling memory and bandwidth through Mig can improve performance, but losing Mig without sharing the GPU results in a performance loss for no reason.
  • Monitoring pipeline utilization can help understand user jobs and optimize GPU usage.
Conference:  Transform X 2022
Authors: Varun Mohan
2022-10-19

Graphics Processing Units (GPUs) are used for training artificial intelligence and deep learning models, particularly those related to ML inference use cases. However, using GPUs to deploy models at scale can create several challenges for ML practitioners. In this session, Varun Mohan, CEO and Co-Founder of Exafunction, shared the best practices he’s learned to build an architecture that optimizes GPUs for deep learning workloads. Mohan explained the advantages for using GPUs for ML deployment, as well as where they might not have as many benefits. Mohan also discussed cost, memory, and other factors in the GPU-vs-CPU equation. He discussed inefficiencies that may arise in different scenarios and some of the issues related to network bandwidth and egress. Mohan offered techniques, including the importance of batching workloads and optimizing your models, to solve these problems. Finally, he discussed how some companies are using GPUs to run their recommendation and serving systems. Before Exafunction, Mohan was a technical lead and senior manager at Nuro, where he saw the power of deep learning and the large challenges of productionizing it at scale.
Authors: Steve Wong, Myles Gray
2021-10-14

tldr - powered by Generative AI

The presentation discusses the Bitfusion integration for Kubernetes and its benefits for running AI/ML workloads on top of Kubernetes.
  • Bitfusion is a client-server model that abstracts the GPU from the workload, allowing for consolidation of GPUs into fewer VMs
  • Bitfusion integration for Kubernetes allows for automatic injection of the Bitfusion client into an app's deployment, enabling transparent access to GPUs
  • Using Bitfusion with Kubernetes can increase efficiency and reduce energy consumption for AI/ML workloads