logo

Human-Friendly, Production-Ready Data Science Stack With Metaflow & Kubernetes

2022-10-27

Authors:   Saravanan Balasubramanian, Savin Goyal


Summary

The presentation discusses the challenges of introducing machine learning into applications and the need for infrastructure that can provide end-to-end solutions for the entire life cycle of machine learning. It also covers the importance of workflow orchestration and reproducibility in machine learning.
  • Infrastructure that can provide end-to-end solutions for the entire life cycle of machine learning is necessary for successful implementation of machine learning into applications
  • Workflow orchestration is important for productionizing machine learning workflows
  • Reproducibility is important for ensuring trust in machine learning models
  • Model deployment can mean many different things depending on the business context
The presentation mentions the example of Netflix, where the need for infrastructure that can provide end-to-end solutions for the entire life cycle of machine learning was recognized and addressed through the development of Metaflow. The importance of workflow orchestration and reproducibility in machine learning is illustrated through the challenges of introducing machine learning into applications and the need for reliable and trustworthy models.

Abstract

There is a pressing need for tools and workflows that meet data scientists where they are. This is also a serious business need: How to enable an organization of data scientists, who are not software engineers by training, to build and deploy end-to-end machine learning workflows and applications independently.In this talk, we discuss the problem space and the approach we took to solving it with Metaflow, the open-source framework we developed at Netflix, which now powers hundreds of business-critical ML projects at Netflix and other companies from bioinformatics and drones to real estate. We wanted to provide the best possible user experience for data scientists, allowing them to focus on parts they like (modeling using their favorite off-the-shelf libraries) while providing robust built-in solutions for the foundational infrastructure: data, compute, orchestration, and versioning.In this talk, we will demo our latest work that builds on top of Kubernetes.You will learn about - What to expect from a modern ML infrastructure stack. - Using tools such as Metaflow & Kubernetes to boost the productivity of your data science organization, based on lessons learned from Netflix and many other companies.

Materials:

Post a comment

Related work


Authors: Bhakti Radharapu
2022-06-23

Conference:  Transform X 2022
Authors: Rong Yan, Saad Ahmed, Aatish Nayak
2022-10-19