How Cookpad Leverages Triton Inference Server To Boost Their Model Serving


Authors:   Jose Navarro, Prayana Galih


The adoption of MLOps practices and tooling by organizations has considerably reduced the pain points to productionise Machine Learning models. However, with the increase of the number of models available by a company to deploy, the diversity of frameworks used to train those models and the different infrastructure required to run each model, new challenges arise for Machine Learning Platform teams e.g: How can we deploy new models from the same or different frameworks concurrently? How can we improve throughput and optimize resource utilization in our serving infrastructure, especially GPUs? Cookpad ML Platform Engineers will talk in this session how Triton Inference Server, an open-source model serving tool from Nvidia, can simplify the process of model deployment and optimise the resource utilisation by efficiently supporting concurrent models on single GPU or CPU, and multi-GPU servers.Click here to view captioning/translation in the MeetingPlay platform!