An increasing number of applications and services can benefit from GPUs, yet costs and other constraints often prohibit installation in all compute hosts. “Landlocked” GPUs resources often lead to underutilized cycles and wasted spending. This session will describe how a pool of available GPU resources within a vSphere cluster can be shared across a broader number of Kubernetes cluster nodes to accelerate workloads like AI, deep learning and inference. This can provide full or partial GPU compute capacity at scale to Kubernetes workloads, even when these are running in pods on hosts without an installed GPU. The session will show an example based on running a TensorFlow workloads on Knative. The K8s VMware User Group shares best practices for hosting K8s on VMware infrastructure, and we will close the session with details on how you can participate in the group.