logo
Dates

Author


Conferences

Tags

Sort by:  

Authors: Maulin Patel, Pradeep Venkatachalam
2022-05-18

tldr - powered by Generative AI

The presentation discusses the challenges of sharing GPUs in Kubernetes and introduces two solutions: time sharing and multi-instance GPU.
  • Notebooks attached to GPUs waste expensive resources when idle
  • Real-time applications like chat box, vision product search, and product recommendation require latency-sensitive and business-critical solutions
  • Kubernetes allows fractional utilization of CPUs but not GPUs, leading to inefficient allocation
  • Time sharing allows multiple containers to run on a single GPU by allocating time slices fairly to all containers
  • Multi-instance GPU allows multiple containers to share a single GPU by creating multiple virtual GPUs
  • Both solutions address most use cases and workload needs
  • The solution is fully managed by GKE and can be configured through API calls or UI/UX