## Overview Deployed production machine learning models come on different sizes, shapes and flavours when deployed in cloud native infrastructure - each with varying hardware (and software) requirements. Whether it is RAM, CPU, GPU or Disk Space, there won't be an optimal global configuration for all your models' training and inference. In this talk we will cover the motivations and concepts around general benchmarking in software, as well as the key nuanced requirements to leverage these concepts in machine learning systems. We will learn about the theory behind benchmarking specifically on machine learning models, as well as the parameters that need to be accounted for, including latency, throughput, spikes, performance percentiles, outliers, between others. We will dive into a hands on example, where we will benchmark a model across multiple parameters to identify optimal performance on a specific hardware using Argo, Kubernetes and Seldon Core.