logo
Dates

Author


Conferences

Tags

Sort by:  

Authors: Dan Sun, Theofilos Papapanagiotou
2023-04-21

tldr - powered by Generative AI

K-Serve is a tool for deploying machine learning models that can handle large language models with billions of parameters. It allows for easy deployment and management of models, as well as the ability to observe and analyze model performance.
  • K-Serve allows for easy deployment and management of machine learning models
  • It can handle large language models with billions of parameters
  • Observation and analysis of model performance is possible with K-Serve
  • The future of K-Serve is to support even larger language models
Authors: Chin Huang, Ted Chang
2022-06-23

tldr - powered by Generative AI

Overview of K-Serve with Model Mesh and demo of model inference using online features
  • K-Serve is a standards-based model serving platform built on top of Kubernetes
  • Model Mesh in K-Serve is designed to address Kubernetes' resource limitations and allows for high density and scalability
  • Model Mesh architecture includes serving runtime deployments, containers for model mesh logic, adapters for retrieving models, and model servers for inference
  • Scalability test showed that 20k simple stream models could be deployed into two serving runtime pods in a small Kubernetes cluster
  • Demo showed integration of open source model mesh model serving layer with Feast for multi-region model serving in a Kubernetes cluster
Authors: Animesh Singh
2021-10-14

tldr - powered by Generative AI

K-Serve is a highly scalable and standard-based model inference platform on Kubernetes for trusted AI. It addresses the challenges of deploying machine learning models in production systems.
  • Deploying machine learning models in production systems is difficult and requires considering the cost of deployment, monitoring, security, and scalability.
  • K-Serve is a solution that addresses these challenges by providing a highly scalable and standard-based model inference platform on Kubernetes for trusted AI.
  • K-Serve integrates with multiple popular model servers in the industry and supports various machine learning frameworks.
  • K-Serve defines a standard inference protocol to provide a unified user experience and easily integrate with multiple model servers.
  • K-Serve addresses scalability limitations by reducing resource overhead and deploying multiple models in one inference service.