Kubernetes VMware User Group: Using GPUs with K8s on vSphere


Authors:   Steve Wong, Myles Gray


The presentation discusses the Bitfusion integration for Kubernetes and its benefits for running AI/ML workloads on top of Kubernetes.
  • Bitfusion is a client-server model that abstracts the GPU from the workload, allowing for consolidation of GPUs into fewer VMs
  • Bitfusion integration for Kubernetes allows for automatic injection of the Bitfusion client into an app's deployment, enabling transparent access to GPUs
  • Using Bitfusion with Kubernetes can increase efficiency and reduce energy consumption for AI/ML workloads
The speaker uses the analogy of using a shark versus a school of piranhas to illustrate the benefits of parallel processing with GPUs for certain types of jobs


An increasing number of applications and services can benefit from GPUs, yet costs and other constraints often prohibit installation in all compute hosts. “Landlocked” GPUs resources often lead to underutilized cycles and wasted spending. This session will describe how a pool of available GPU resources within a vSphere cluster can be shared across a broader number of Kubernetes cluster nodes to accelerate workloads like AI, deep learning and inference. This can provide full or partial GPU compute capacity at scale to Kubernetes workloads, even when these are running in pods on hosts without an installed GPU. The session will show an example based on running a TensorFlow workloads on Knative. The K8s VMware User Group shares best practices for hosting K8s on VMware infrastructure, and we will close the session with details on how you can participate in the group.