logo

Integrating High Performance Feature Stores with KServe Model Serving

2022-06-23

Authors:   Chin Huang, Ted Chang


Summary

Overview of K-Serve with Model Mesh and demo of model inference using online features
  • K-Serve is a standards-based model serving platform built on top of Kubernetes
  • Model Mesh in K-Serve is designed to address Kubernetes' resource limitations and allows for high density and scalability
  • Model Mesh architecture includes serving runtime deployments, containers for model mesh logic, adapters for retrieving models, and model servers for inference
  • Scalability test showed that 20k simple stream models could be deployed into two serving runtime pods in a small Kubernetes cluster
  • Demo showed integration of open source model mesh model serving layer with Feast for multi-region model serving in a Kubernetes cluster
The scalability test showed that even in a small Kubernetes cluster with limited resources, K-Serve with Model Mesh was able to deploy and run 20k simple stream models, demonstrating its high density and scalability capabilities.

Abstract

Having access to a consistent set of dataset features during different phases of the ML lifecycle is becoming critical. Companies that build and deploy machine learning models may need to manage hundreds of features, and they may even require using the latest features for real time prediction. Feast (Feature Store) attempts to tackle these problems by providing a standard high performing go-based SDK for retrieving features needed for distributed model serving. In this talk, attendees will learn how to build a production ready feature store on Kubernetes by using Feast which will be used to serve features to the model. Additionally, attendees will see how Feast can be used with KServe, a serverless model inferencing engine, to retrieve stored features in real time. In this talk, we hope to share how users can get started with using Feast on Kubernetes to achieve mission critical high performance inference need. Here, we set up an end-to-end demo using the Feast KServe transformer on Kubernetes to demonstrate how online features can be served to the KServe for real time inferencing.

Materials: