Overview of K-Serve with Model Mesh and demo of model inference using online features
K-Serve is a standards-based model serving platform built on top of Kubernetes
Model Mesh in K-Serve is designed to address Kubernetes' resource limitations and allows for high density and scalability
Model Mesh architecture includes serving runtime deployments, containers for model mesh logic, adapters for retrieving models, and model servers for inference
Scalability test showed that 20k simple stream models could be deployed into two serving runtime pods in a small Kubernetes cluster
Demo showed integration of open source model mesh model serving layer with Feast for multi-region model serving in a Kubernetes cluster